We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 654 entries: 1-654 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 3 Jun 24

[1]  arXiv:2405.20347 [pdf, other]
Title: Small Language Models for Application Interactions: A Case Study
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We study the efficacy of Small Language Models (SLMs) in facilitating application usage through natural language interactions. Our focus here is on a particular internal application used in Microsoft for cloud supply chain fulfilment. Our experiments show that small models can outperform much larger ones in terms of both accuracy and running time, even when fine-tuned on small datasets. Alongside these results, we also highlight SLM-based system design considerations.

[2]  arXiv:2405.20350 [pdf, other]
Title: Linear Function Approximation as a Computationally Efficient Method to Solve Classical Reinforcement Learning Challenges
Authors: Hari Srikanth
Subjects: Machine Learning (cs.LG)

Neural Network based approximations of the Value function make up the core of leading Policy Based methods such as Trust Regional Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). While this adds significant value when dealing with very complex environments, we note that in sufficiently low State and action space environments, a computationally expensive Neural Network architecture offers marginal improvement over simpler Value approximation methods. We present an implementation of Natural Actor Critic algorithms with actor updates through Natural Policy Gradient methods. This paper proposes that Natural Policy Gradient (NPG) methods with Linear Function Approximation as a paradigm for value approximation may surpass the performance and speed of Neural Network based models such as TRPO and PPO within these environments. Over Reinforcement Learning benchmarks Cart Pole and Acrobot, we observe that our algorithm trains much faster than complex neural network architectures, and obtains an equivalent or greater result. This allows us to recommend the use of NPG methods with Linear Function Approximation over TRPO and PPO for both traditional and sparse reward low dimensional problems.

[3]  arXiv:2405.20351 [pdf, other]
Title: ADR-BC: Adversarial Density Weighted Regression Behavior Cloning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Typically, traditional Imitation Learning (IL) methods first shape a reward or Q function and then use this shaped function within a reinforcement learning (RL) framework to optimize the empirical policy. However, if the shaped reward/Q function does not adequately represent the ground truth reward/Q function, updating the policy within a multi-step RL framework may result in cumulative bias, further impacting policy learning. Although utilizing behavior cloning (BC) to learn a policy by directly mimicking a few demonstrations in a single-step updating manner can avoid cumulative bias, BC tends to greedily imitate demonstrated actions, limiting its capacity to generalize to unseen state action pairs. To address these challenges, we propose ADR-BC, which aims to enhance behavior cloning through augmented density-based action support, optimizing the policy with this augmented support. Specifically, the objective of ADR-BC shares the similar physical meanings that matching expert distribution while diverging the sub-optimal distribution. Therefore, ADR-BC can achieve more robust expert distribution matching. Meanwhile, as a one-step behavior cloning framework, ADR-BC avoids the cumulative bias associated with multi-step RL frameworks. To validate the performance of ADR-BC, we conduct extensive experiments. Specifically, ADR-BC showcases a 10.5% improvement over the previous state-of-the-art (SOTA) generalized IL baseline, CEIL, across all tasks in the Gym-Mujoco domain. Additionally, it achieves an 89.5% improvement over Implicit Q Learning (IQL) using real rewards across all tasks in the Adroit and Kitchen domains. On the other hand, we conduct extensive ablations to further demonstrate the effectiveness of ADR-BC.

[4]  arXiv:2405.20354 [pdf, other]
Title: Literature Filtering for Systematic Reviews with Transformers
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Identifying critical research within the growing body of academic work is an essential element of quality research. Systematic review processes, used in evidence-based medicine, formalise this as a procedure that must be followed in a research program. However, it comes with an increasing burden in terms of the time required to identify the important articles of research for a given topic. In this work, we develop a method for building a general-purpose filtering system that matches a research question, posed as a natural language description of the required content, against a candidate set of articles obtained via the application of broad search terms. Our results demonstrate that transformer models, pre-trained on biomedical literature then fine tuned for the specific task, offer a promising solution to this problem. The model can remove large volumes of irrelevant articles for most research questions.

[5]  arXiv:2405.20355 [pdf, other]
Title: Enhancing Adversarial Robustness in SNNs with Sparse Gradients
Comments: accepted by ICML 2024
Subjects: Neural and Evolutionary Computing (cs.NE); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Spiking Neural Networks (SNNs) have attracted great attention for their energy-efficient operations and biologically inspired structures, offering potential advantages over Artificial Neural Networks (ANNs) in terms of energy efficiency and interpretability. Nonetheless, similar to ANNs, the robustness of SNNs remains a challenge, especially when facing adversarial attacks. Existing techniques, whether adapted from ANNs or specifically designed for SNNs, exhibit limitations in training SNNs or defending against strong attacks. In this paper, we propose a novel approach to enhance the robustness of SNNs through gradient sparsity regularization. We observe that SNNs exhibit greater resilience to random perturbations compared to adversarial perturbations, even at larger scales. Motivated by this, we aim to narrow the gap between SNNs under adversarial and random perturbations, thereby improving their overall robustness. To achieve this, we theoretically prove that this performance gap is upper bounded by the gradient sparsity of the probability associated with the true label concerning the input image, laying the groundwork for a practical strategy to train robust SNNs by regularizing the gradient sparsity. We validate the effectiveness of our approach through extensive experiments on both image-based and event-based datasets. The results demonstrate notable improvements in the robustness of SNNs. Our work highlights the importance of gradient sparsity in SNNs and its role in enhancing robustness.

[6]  arXiv:2405.20358 [pdf, other]
Title: Medication Recommendation via Dual Molecular Modalities and Multi-Substructure Distillation
Comments: 14 pages, 9 figures
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Medication recommendation combines patient medical history with biomedical knowledge to assist doctors in determining medication combinations more accurately and safely. Existing approaches based on molecular knowledge overlook the atomic geometric structure of molecules, failing to capture the high-dimensional characteristics and intrinsic physical properties of medications, leading to structural confusion and the inability to extract useful substructures from individual patient visits. To address these limitations, we propose BiMoRec, which overcomes the inherent lack of molecular essential information in 2D molecular structures by incorporating 3D molecular structures and atomic properties. To retain the fast response required of recommendation systems, BiMoRec maximizes the mutual information between the two molecular modalities through bimodal graph contrastive learning, achieving the integration of 2D and 3D molecular graphs, and finally distills substructures through interaction with single patient visits. Specifically, we use deep learning networks to construct a pre-training method to obtain representations of 2D and 3D molecular structures and substructures, and we use contrastive learning to derive mutual information. Subsequently, we generate fused molecular representations through a trained GNN module, re-determining the relevance of substructure representations in conjunction with the patient's clinical history information. Finally, we generate the final medication combination based on the extracted substructure sequences. Our implementation on the MIMIC-III and MIMIC-IV datasets demonstrates that our method achieves state-of-the-art performance. Compared to the next best baseline, our model improves accuracy by 1.8\% while maintaining the same level of DDI as the baseline.

[7]  arXiv:2405.20362 [pdf, other]
Title: Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Comments: Our dataset, tool outputs, and labels will be made available upon publication. This version of the manuscript (May 30, 2024) is updated to reflect an evaluation of Westlaw's AI-Assisted Research
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, certain legal research providers have touted methods such as retrieval-augmented generation (RAG) as "eliminating" (Casetext, 2023) or "avoid[ing]" hallucinations (Thomson Reuters, 2023), or guaranteeing "hallucination-free" legal citations (LexisNexis, 2023). Because of the closed nature of these systems, systematically assessing these claims is challenging. In this article, we design and report on the first preregistered empirical evaluation of AI-driven legal research tools. We demonstrate that the providers' claims are overstated. While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time. We also document substantial differences between systems in responsiveness and accuracy. Our article makes four key contributions. It is the first to assess and report the performance of RAG-based proprietary legal AI tools. Second, it introduces a comprehensive, preregistered dataset for identifying and understanding vulnerabilities in these systems. Third, it proposes a clear typology for differentiating between hallucinations and accurate legal responses. Last, it provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs, which remains a central open question for the responsible integration of AI into law.

[8]  arXiv:2405.20363 [pdf, other]
Title: LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild
Comments: 7 pages, 3 figures, 5 tables, CVPR 2024 Workshop on Computer Vision in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images from various countries via Google Street View. Then, we conduct training-free and training-based evaluations on closed-source and open-source multi-modal language models. we conduct both training-free and training-based evaluations on closed-source and open-source multimodal language models. Our findings indicate that closed-source models demonstrate superior geolocation abilities, while open-source models can achieve comparable performance through fine-tuning.

[9]  arXiv:2405.20364 [pdf, other]
Title: Learning 3D Robotics Perception using Inductive Priors
Comments: Georgia Tech Ph.D. Thesis, December 2023. For more details: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Recent advances in deep learning have led to a data-centric intelligence i.e. artificially intelligent models unlocking the potential to ingest a large amount of data and be really good at performing digital tasks such as text-to-image generation, machine-human conversation, and image recognition. This thesis covers the topic of learning with structured inductive bias and priors to design approaches and algorithms unlocking the potential of principle-centric intelligence. Prior knowledge (priors for short), often available in terms of past experience as well as assumptions of how the world works, helps the autonomous agent generalize better and adapt their behavior based on past experience. In this thesis, I demonstrate the use of prior knowledge in three different robotics perception problems. 1. object-centric 3D reconstruction, 2. vision and language for decision-making, and 3. 3D scene understanding. To solve these challenging problems, I propose various sources of prior knowledge including 1. geometry and appearance priors from synthetic data, 2. modularity and semantic map priors and 3. semantic, structural, and contextual priors. I study these priors for solving robotics 3D perception tasks and propose ways to efficiently encode them in deep learning models. Some priors are used to warm-start the network for transfer learning, others are used as hard constraints to restrict the action space of robotics agents. While classical techniques are brittle and fail to generalize to unseen scenarios and data-centric approaches require a large amount of labeled data, this thesis aims to build intelligent agents which require very-less real-world data or data acquired only from simulation to generalize to highly dynamic and cluttered environments in novel simulations (i.e. sim2sim) or real-world unseen environments (i.e. sim2real) for a holistic scene understanding of the 3D world.

[10]  arXiv:2405.20380 [pdf, other]
Title: Gradient Inversion of Federated Diffusion Models
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data. Training effective diffusion models require massive real data, which is privately owned by distributed parties. Each data party can collaboratively train diffusion models in a federated learning manner by sharing gradients instead of the raw data. In this paper, we study the privacy leakage risk of gradient inversion attacks. First, we design a two-phase fusion optimization, GIDM, to leverage the well-trained generative model itself as prior knowledge to constrain the inversion search (latent) space, followed by pixel-wise fine-tuning. GIDM is shown to be able to reconstruct images almost identical to the original ones. Considering a more privacy-preserving training scenario, we then argue that locally initialized private training noise $\epsilon$ and sampling step t may raise additional challenges for the inversion attack. To solve this, we propose a triple-optimization GIDM+ that coordinates the optimization of the unknown data, $\epsilon$ and $t$. Our extensive evaluation results demonstrate the vulnerability of sharing gradient for data protection of diffusion models, even high-resolution images can be reconstructed with high quality.

[11]  arXiv:2405.20387 [pdf, ps, other]
Title: Sensitivity Analysis for Piecewise-Affine Approximations of Nonlinear Programs with Polytopic Constraints
Comments: 6 pages, 4 figures, accepted for publication in IEEE Control Systems Letters
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Nonlinear Programs (NLPs) are prevalent in optimization-based control of nonlinear systems. Solving general NLPs is computationally expensive, necessitating the development of fast hardware or tractable suboptimal approximations. This paper investigates the sensitivity of the solutions of NLPs with polytopic constraints when the nonlinear continuous objective function is approximated by a PieceWise-Affine (PWA) counterpart. By leveraging perturbation analysis using a convex modulus, we derive guaranteed bounds on the distance between the optimal solution of the original polytopically-constrained NLP and that of its approximated formulation. Our approach aids in determining criteria for achieving desired solution bounds. Two case studies on the Eggholder function and nonlinear model predictive control of an inverted pendulum demonstrate the theoretical results.

[12]  arXiv:2405.20390 [pdf, other]
Title: Quantitative Convergences of Lie Group Momentum Optimizers
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)

Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly proposed. Their convergence rates are explicitly quantified under $L$-smoothness and local strong convexity assumptions. Lie NAG-SC provides acceleration over the momentumless case, i.e. Riemannian gradient descent, but Lie Heavy-Ball does not. When compared to existing accelerated optimizers for general manifolds, both Lie Heavy-Ball and Lie NAG-SC are computationally cheaper and easier to implement, thanks to their utilization of group structure. Only gradient oracle and exponential map are required, but not logarithm map or parallel transport which are computational costly.

[13]  arXiv:2405.20397 [pdf, other]
Title: Explainable Data-driven Modeling of Adsorption Energy in Heterogeneous Catalysis
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)

The increasing popularity of machine learning (ML) in catalysis has spurred interest in leveraging these techniques to enhance catalyst design. Our study aims to bridge the gap between physics-based studies and data-driven methodologies by integrating ML techniques with eXplainable AI (XAI). Specifically, we employ two XAI techniques: Post-hoc XAI analysis and Symbolic Regression. These techniques help us unravel the correlation between adsorption energy and the properties of the adsorbate-catalyst system. Leveraging a large dataset such as the Open Catalyst Dataset (OC20), we employ a combination of shallow ML techniques and XAI methodologies. Our investigation involves utilizing multiple shallow machine learning techniques to predict adsorption energy, followed by post-hoc analysis for feature importance, inter-feature correlations, and the influence of various feature values on the prediction of adsorption energy. The post-hoc analysis reveals that adsorbate properties exert a greater influence than catalyst properties in our dataset. The top five features based on higher Shapley values are adsorbate electronegativity, the number of adsorbate atoms, catalyst electronegativity, effective coordination number, and the sum of atomic numbers of the adsorbate molecule. There is a positive correlation between catalyst and adsorbate electronegativity with the prediction of adsorption energy. Additionally, symbolic regression yields results consistent with SHAP analysis. It deduces a mathematical relationship indicating that the square of the catalyst electronegativity is directly proportional to the adsorption energy. These consistent correlations resemble those derived from physics-based equations in previous research. Our work establishes a robust framework that integrates ML techniques with XAI, leveraging large datasets like OC20 to enhance catalyst design through model explainability.

[14]  arXiv:2405.20404 [pdf, other]
Title: XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of elucidating and explaining the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to be classification or next-word prediction. Few initial attempts aiming to explain the entire language generation often treat input prompt texts independently, ignoring their combinatorial effects on the follow-up generation. In this study, we introduce a counterfactual explanation framework based on joint prompt attribution, XPrompt, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation. Particularly, we formulate the task of prompt attribution for generation interpretation as a combinatorial optimization problem, and introduce a probabilistic algorithm to search for the casual input combination in the discrete space. We define and utilize multiple metrics to evaluate the produced explanations, demonstrating both faithfulness and efficiency of our framework.

[15]  arXiv:2405.20405 [pdf, other]
Title: Private Mean Estimation with Person-Level Differential Privacy
Comments: 67 pages, 3 figures
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that
\[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d }{ \alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\]
people are necessary and sufficient to estimate the mean up to distance $\alpha$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate DP (with slightly degraded sample complexity) and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the well known noisy-clipped-mean approach, but the analysis for our setting requires new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables, and a new argument for bounding the bias introduced by clipping.

[16]  arXiv:2405.20410 [pdf, other]
Title: SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Expressive speech-to-speech translation (S2ST) is a key research topic in seamless communication, which focuses on the preservation of semantics and speaker vocal style in translated speech. Early works synthesized speaker style aligned speech in order to directly learn the mapping from speech to target speech spectrogram. Without reliance on style aligned data, recent studies leverage the advances of language modeling (LM) and build cascaded LMs on semantic and acoustic tokens. This work proposes SeamlessExpressiveLM, a single speech language model for expressive S2ST. We decompose the complex source-to-target speech mapping into intermediate generation steps with chain-of-thought prompting. The model is first guided to translate target semantic content and then transfer the speaker style to multi-stream acoustic units. Evaluated on Spanish-to-English and Hungarian-to-English translations, SeamlessExpressiveLM outperforms cascaded LMs in both semantic quality and style transfer, meanwhile achieving better parameter efficiency.

[17]  arXiv:2405.20412 [pdf, other]
Title: Audio2Rig: Artist-oriented deep learning tool for facial animation
Comments: Video examples and description: this https URL&ab_channel=Golaem
Subjects: Graphics (cs.GR); Machine Learning (cs.LG)

Creating realistic or stylized facial and lip sync animation is a tedious task. It requires lot of time and skills to sync the lips with audio and convey the right emotion to the character's face. To allow animators to spend more time on the artistic and creative part of the animation, we present Audio2Rig: a new deep learning based tool leveraging previously animated sequences of a show, to generate facial and lip sync rig animation from an audio file. Based in Maya, it learns from any production rig without any adjustment and generates high quality and stylized animations which mimic the style of the show. Audio2Rig fits in the animator workflow: since it generates keys on the rig controllers, the animation can be easily retaken. The method is based on 3 neural network modules which can learn an arbitrary number of controllers. Hence, different configurations can be created for specific parts of the face (such as the tongue, lips or eyes). With Audio2Rig, animators can also pick different emotions and adjust their intensities to experiment or customize the output, and have high level controls on the keyframes setting. Our method shows excellent results, generating fine animation details while respecting the show style. Finally, as the training relies on the studio data and is done internally, it ensures data privacy and prevents from copyright infringement.

[18]  arXiv:2405.20413 [pdf, other]
Title: Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
Comments: 20 pages
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Large Language Models (LLMs) are typically harmless but remain vulnerable to carefully crafted prompts known as ``jailbreaks'', which can bypass protective measures and induce harmful behavior. Recent advancements in LLMs have incorporated moderation guardrails that can filter outputs, which trigger processing errors for certain malicious questions. Existing red-teaming benchmarks often neglect to include questions that trigger moderation guardrails, making it difficult to evaluate jailbreak effectiveness. To address this issue, we introduce JAMBench, a harmful behavior benchmark designed to trigger and evaluate moderation guardrails. JAMBench involves 160 manually crafted instructions covering four major risk categories at multiple severity levels. Furthermore, we propose a jailbreak method, JAM (Jailbreak Against Moderation), designed to attack moderation guardrails using jailbreak prefixes to bypass input-level filters and a fine-tuned shadow model functionally equivalent to the guardrail model to generate cipher characters to bypass output-level filters. Our extensive experiments on four LLMs demonstrate that JAM achieves higher jailbreak success ($\sim$ $\times$ 19.88) and lower filtered-out rates ($\sim$ $\times$ 1/6) than baselines.

[19]  arXiv:2405.20414 [pdf, ps, other]
Title: The Impact of Ontology on the Prediction of Cardiovascular Disease Compared to Machine Learning Algorithms
Journal-ref: International journal of online and biomedical engineering, Volume 18, Issue 11, 2022, Pages 143 - 157
Subjects: Machine Learning (cs.LG)

Cardiovascular disease is one of the chronic diseases that is on the rise. The complications occur when cardiovascular disease is not discovered early and correctly diagnosed at the right time. Various machine learning approaches, including ontology-based Machine Learning techniques, have lately played an essential role in medical science by building an automated system that can identify heart illness. This paper compares and reviews the most prominent machine learning algorithms, as well as ontology-based Machine Learning classification. Random Forest, Logistic regression, Decision Tree, Naive Bayes, k-Nearest Neighbours, Artificial Neural Network, and Support Vector Machine were among the classification methods explored. The dataset used consists of 70000 instances and can be downloaded from the Kaggle website. The findings are assessed using performance measures generated from the confusion matrix, such as F-Measure, Accuracy, Recall, and Precision. The results showed that the ontology outperformed all the machine learning algorithms.

[20]  arXiv:2405.20416 [pdf, other]
Title: First Tree-like Quantum Data Structure: Quantum B+ Tree
Subjects: Databases (cs.DB)

Quantum computing is a popular topic in computer science, which has recently attracted many studies in various areas such as machine learning and network. However, the topic of quantum data structures seems neglected. There is an open problem in the database area: Can we improve existing data structures by quantum techniques? Consider a dataset of key-record pairs. Given an interval as a query range, a classical B+ tree can report all the records with keys within this interval, which is called a range query, in O(log N + k) time, where N is the total number of records and k is the output size. It is asymptotically optimal in a classical computer but not efficient enough in a quantum computer, because it is expected that the execution time and the output size are linear in a quantum computer.
In this paper, we propose the quantum range query problem. Different from the classical range queries, a quantum range query returns the results in quantum bits, which has broad potential applications due to the foreseeable advance of quantum computers and quantum algorithms. To the best of our knowledge, we design the first tree-like quantum data structure called the quantum B+ tree. Based on this data structure, we propose a hybrid quantum-classical algorithm to do the range search. It answers a static quantum range query in O(log_B N) time, which is asymptotically optimal in quantum computers. Since the execution time does not depend on the output size (i.e., k, which could be as large as O(N)), it is significantly faster than the classical data structure. Moreover, we extend our quantum B+ tree to answer the dynamic and d-dimensional quantum range queries efficiently in O(log^2_B N) and O(log^d_B N) time, respectively. Our experimental results show that our proposed quantum data structures achieve up to 1000x improvement in the number of memory accesses compared to their classical competitors.

[21]  arXiv:2405.20419 [pdf, other]
Title: Enhancing Antibiotic Stewardship using a Natural Language Approach for Better Feature Representation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The rapid emergence of antibiotic-resistant bacteria is recognized as a global healthcare crisis, undermining the efficacy of life-saving antibiotics. This crisis is driven by the improper and overuse of antibiotics, which escalates bacterial resistance. In response, this study explores the use of clinical decision support systems, enhanced through the integration of electronic health records (EHRs), to improve antibiotic stewardship. However, EHR systems present numerous data-level challenges, complicating the effective synthesis and utilization of data. In this work, we transform EHR data into a serialized textual representation and employ pretrained foundation models to demonstrate how this enhanced feature representation can aid in antibiotic susceptibility predictions. Our results suggest that this text representation, combined with foundation models, provides a valuable tool to increase interpretability and support antibiotic stewardship efforts.

[22]  arXiv:2405.20420 [pdf, other]
Title: Back to the Basics on Predicting Transfer Performance
Comments: 15 pages, 3 figures, 2 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In the evolving landscape of deep learning, selecting the best pre-trained models from a growing number of choices is a challenge. Transferability scorers propose alleviating this scenario, but their recent proliferation, ironically, poses the challenge of their own assessment. In this work, we propose both robust benchmark guidelines for transferability scorers, and a well-founded technique to combine multiple scorers, which we show consistently improves their results. We extensively evaluate 13 scorers from literature across 11 datasets, comprising generalist, fine-grained, and medical imaging datasets. We show that few scorers match the predictive performance of the simple raw metric of models on ImageNet, and that all predictors suffer on medical datasets. Our results highlight the potential of combining different information sources for reliably predicting transferability across varied domains.

[23]  arXiv:2405.20421 [pdf, other]
Title: Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA
Subjects: Artificial Intelligence (cs.AI)

Large Multimodal Models (LMMs) have shown remarkable progress in the field of medical Visual Question Answering (Med-VQA), achieving high accuracy on existing benchmarks. However, their reliability under robust evaluation is questionable. This study reveals that state-of-the-art models, when subjected to simple probing evaluation, perform worse than random guessing on medical diagnosis questions. To address this critical evaluation problem, we introduce the Probing Evaluation for Medical Diagnosis (ProbMed) dataset to rigorously assess LMM performance in medical imaging through probing evaluation and procedural diagnosis. Particularly, probing evaluation features pairing original questions with negation questions with hallucinated attributes, while procedural diagnosis requires reasoning across various diagnostic dimensions for each image, including modality recognition, organ identification, clinical findings, abnormalities, and positional grounding. Our evaluation reveals that top-performing models like GPT-4V and Gemini Pro perform worse than random guessing on specialized diagnostic questions, indicating significant limitations in handling fine-grained medical inquiries. Besides, models like LLaVA-Med struggle even with more general questions, and results from CheXagent demonstrate the transferability of expertise across different modalities of the same organ, showing that specialized domain knowledge is still crucial for improving performance. This study underscores the urgent need for more robust evaluation to ensure the reliability of LMMs in critical fields like medical diagnosis, and current LMMs are still far from applicable to those fields.

[24]  arXiv:2405.20423 [pdf, other]
Title: Dynamics and Contracts for an Agent with Misspecified Beliefs
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)

We study a single-agent contracting environment where the agent has misspecified beliefs about the outcome distributions for each chosen action. First, we show that for a myopic Bayesian learning agent with only two possible actions, the empirical frequency of the chosen actions converges to a Berk-Nash equilibrium. However, through a constructed example, we illustrate that this convergence in action frequencies fails when the agent has three or more actions. Furthermore, with multiple actions, even computing an $\varepsilon$-Berk-Nash equilibrium requires at least quasi-polynomial time under the Exponential Time Hypothesis (ETH) for the PPAD-class. This finding poses a significant challenge to the existence of simple learning dynamics that converge in action frequencies. Motivated by this challenge, we focus on the contract design problems for an agent with misspecified beliefs and two possible actions. We show that the revenue-optimal contract, under a Berk-Nash equilibrium, can be computed in polynomial time. Perhaps surprisingly, we show that even a minor degree of misspecification can result in a significant reduction in optimal revenue.

[25]  arXiv:2405.20424 [pdf, other]
Title: Euclidean Maximum Matchings in the Plane---Local to Global
Subjects: Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)

Let $M$ be a perfect matching on a set of points in the plane where every edge is a line segment between two points. We say that $M$ is globally maximum if it is a maximum-length matching on all points. We say that $M$ is $k$-local maximum if for any subset $M'=\{a_1b_1,\dots,a_kb_k\}$ of $k$ edges of $M$ it holds that $M'$ is a maximum-length matching on points $\{a_1,b_1,\dots,a_k,b_k\}$. We show that local maximum matchings are good approximations of global ones.
Let $\mu_k$ be the infimum ratio of the length of any $k$-local maximum matching to the length of any global maximum matching, over all finite point sets in the Euclidean plane. It is known that $\mu_k\geqslant \frac{k-1}{k}$ for any $k\geqslant 2$. We show the following improved bounds for $k\in\{2,3\}$: $\sqrt{3/7}\leqslant\mu_2< 0.93 $ and $\sqrt{3}/2\leqslant\mu_3< 0.98$. We also show that every pairwise crossing matching is unique and it is globally maximum.
Towards our proof of the lower bound for $\mu_2$ we show the following result which is of independent interest: If we increase the radii of pairwise intersecting disks by factor $2/\sqrt{3}$, then the resulting disks have a common intersection.

[26]  arXiv:2405.20426 [pdf, ps, other]
Title: Quality of Non-Convergent Best Response Processes in Multi-Agent Systems through Sink Equilibrium
Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)

Examining the behavior of multi-agent systems is vitally important to many emerging distributed applications - game theory has emerged as a powerful tool set in which to do so. The main approach of game-theoretic techniques is to model agents as players in a game, and predict the emergent behavior through the relevant Nash equilibrium. The virtue from this viewpoint is that by assuming that self-interested decision-making processes lead to Nash equilibrium, system behavior can then be captured by Nash equilibrium without studying the decision-making processes explicitly. This approach has seen success in a wide variety of domains, such as sensor coverage, traffic networks, auctions, and network coordination. However, in many other problem settings, Nash equilibrium are not necessarily guaranteed to exist or emerge from self-interested processes. Thus the main focus of the paper is on the study of sink equilibrium, which are defined as the attractors of these decision-making processes. By classifying system outcomes through a global objective function, we can analyze the resulting approximation guarantees that sink equilibrium have for a given game. Our main result is an approximation guarantee on the sink equilibrium through defining an introduced metric of misalignment, which captures how uniform agents are in their self-interested decision making. Overall, sink equilibrium are naturally occurring in many multi-agent contexts, and we display our results on their quality with respect to two practical problem settings.

[27]  arXiv:2405.20429 [pdf, other]
Title: Quantum Preference Query
Subjects: Databases (cs.DB)

Given a large dataset of many tuples, it is hard for users to pick out their preferred tuples. Thus, the preference query problem, which is to find the most preferred tuples from a dataset, is widely discussed in the database area. In this problem, a utility function is given by the user to evaluate to what extent the user prefers a tuple. However, considering a dataset consisting of N tuples, the existing algorithms need O(N) time to answer a query, or need O(N) time for a cold start to answer a query. The reason is that in a classical computer, a linear time is needed to evaluate the utilities by the utility function for N tuples. In this paper, we discuss the Quantum Preference Query (QPQ) problem, where the dataset is given in a quantum memory, and we use a quantum computer to return the answers. Due to quantum parallelism, the quantum algorithm can theoretically perform better than their classical competitors. We discuss this problem in different kinds of input and output. In the QPQ problem, the input can be a number k or a threshold theta. Given k, the problem is to return k tuples with the highest utilities. Given theta, the problem is to return all the tuples with utilities higher than theta. Also, in QPQ problem, the output can be classical (i.e., a list of tuples) or quantum (i.e., a superposition in quantum bits). We proposed four quantum algorithms to solve the problems in the above four scenarios. We analyze the number of memory accesses needed for each quantum algorithm, which shows that the proposed quantum algorithms are at least quadratically faster than their classical competitors. In our experiments, we show that to answer a QPQ problem, the quantum algorithms achieve up to 1000x improvement in number of memory accesses than their classical competitors, which proved that QPQ problem could be a future direction of the study of preference query problems.

[28]  arXiv:2405.20430 [pdf, other]
Title: Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The increased availability of medical data has significantly impacted healthcare by enabling the application of machine / deep learning approaches in various instances. However, medical datasets are usually small and scattered across multiple providers, suffer from high class-imbalance, and are subject to stringent data privacy constraints. In this paper, the application of a data regularization algorithm, suitable for learning under high class-imbalance, in a federated learning setting is proposed. Specifically, the goal of the proposed method is to enhance model performance for cardiovascular disease prediction by tackling the class-imbalance that typically characterizes datasets used for this purpose, as well as by leveraging patient data available in different nodes of a federated ecosystem without compromising their privacy and enabling more resource sensitive allocation. The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients, achieving improved performance. Meanwhile, its robustness under various hyperparameter settings, as well as its ability to adapt to different resource allocation scenarios, is verified.

[29]  arXiv:2405.20431 [pdf, other]
Title: Exploring the Practicality of Federated Learning: A Survey Towards the Communication Perspective
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Federated Learning (FL) is a promising paradigm that offers significant advancements in privacy-preserving, decentralized machine learning by enabling collaborative training of models across distributed devices without centralizing data. However, the practical deployment of FL systems faces a significant bottleneck: the communication overhead caused by frequently exchanging large model updates between numerous devices and a central server. This communication inefficiency can hinder training speed, model performance, and the overall feasibility of real-world FL applications. In this survey, we investigate various strategies and advancements made in communication-efficient FL, highlighting their impact and potential to overcome the communication challenges inherent in FL systems. Specifically, we define measures for communication efficiency, analyze sources of communication inefficiency in FL systems, and provide a taxonomy and comprehensive review of state-of-the-art communication-efficient FL methods. Additionally, we discuss promising future research directions for enhancing the communication efficiency of FL systems. By addressing the communication bottleneck, FL can be effectively applied and enable scalable and practical deployment across diverse applications that require privacy-preserving, decentralized machine learning, such as IoT, healthcare, or finance.

[30]  arXiv:2405.20433 [pdf, other]
Title: Efficient Industrial Refrigeration Scheduling with Peak Pricing
Subjects: Systems and Control (eess.SY)

The widespread use of industrial refrigeration systems across various sectors contribute significantly to global energy consumption, highlighting substantial opportunities for energy conservation through intelligent control design. As such, this work focuses on control algorithm design in industrial refrigeration that minimize operational costs and provide efficient heat extraction. By adopting tools from inventory control, we characterize the structure of these optimal control policies, exploring the impact of different energy cost-rate structures such as time-of-use (TOU) pricing and peak pricing. While classical threshold policies are optimal under TOU costs, introducing peak pricing challenges their optimality, emphasizing the need for carefully designed control strategies in the presence of significant peak costs. We provide theoretical findings and simulation studies on this phenomenon, offering insights for more efficient industrial refrigeration management.

[31]  arXiv:2405.20434 [pdf, other]
Title: Facilitating Human-LLM Collaboration through Factuality Scores and Source Attributions
Comments: Submitted to the Trust and Reliance in Evolving Human-AI Workflows (TREW) Workshop at CHI 2024
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

While humans increasingly rely on large language models (LLMs), they are susceptible to generating inaccurate or false information, also known as "hallucinations". Technical advancements have been made in algorithms that detect hallucinated content by assessing the factuality of the model's responses and attributing sections of those responses to specific source documents. However, there is limited research on how to effectively communicate this information to users in ways that will help them appropriately calibrate their trust toward LLMs. To address this issue, we conducted a scenario-based study (N=104) to systematically compare the impact of various design strategies for communicating factuality and source attribution on participants' ratings of trust, preferences, and ease in validating response accuracy. Our findings reveal that participants preferred a design in which phrases within a response were color-coded based on the computed factuality scores. Additionally, participants increased their trust ratings when relevant sections of the source material were highlighted or responses were annotated with reference numbers corresponding to those sources, compared to when they received no annotation in the source material. Our study offers practical design guidelines to facilitate human-LLM collaboration and it promotes a new human role to carefully evaluate and take responsibility for their use of LLM outputs.

[32]  arXiv:2405.20435 [pdf, other]
Title: Deep Learning for Computing Convergence Rates of Markov Chains
Subjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep Contractive Drift Calculator (DCDC), the first general-purpose sample-based algorithm for bounding the convergence of Markov chains to stationarity in Wasserstein distance. The DCDC has two components. First, inspired by the new convergence analysis framework in (Qu et.al, 2023), we introduce the Contractive Drift Equation (CDE), the solution of which leads to an explicit convergence bound. Second, we develop an efficient neural-network-based CDE solver. Equipped with these two components, DCDC solves the CDE and converts the solution into a convergence bound. We analyze the sample complexity of the algorithm and further demonstrate the effectiveness of the DCDC by generating convergence bounds for realistic Markov chains arising from stochastic processing networks as well as constant step-size stochastic optimization.

[33]  arXiv:2405.20439 [pdf, other]
Title: Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Comments: 25 pages, 10 figures, 2 tables
Subjects: Machine Learning (cs.LG)

Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD). The originally-proposed motivation behind SAM was to bias neural networks towards flatter minima that are believed to generalize better. However, recent studies have shown conflicting evidence on the relationship between flatness and generalization, suggesting that flatness does fully explain SAM's success. Sidestepping this debate, we identify an orthogonal effect of SAM that is beneficial out-of-distribution: we argue that SAM implicitly balances the quality of diverse features. SAM achieves this effect by adaptively suppressing well-learned features which gives remaining features opportunity to be learned. We show that this mechanism is beneficial in datasets that contain redundant or spurious features where SGD falls for the simplicity bias and would not otherwise learn all available features. Our insights are supported by experiments on real data: we demonstrate that SAM improves the quality of features in datasets containing redundant or spurious features, including CelebA, Waterbirds, CIFAR-MNIST, and DomainBed.

[34]  arXiv:2405.20441 [pdf, other]
Title: SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE (Security Extraction, Understanding \& Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources. Our study evaluates seven state-of-the-art models on these tasks, providing insights into their strengths and weaknesses in cybersecurity contexts, and offer recommendations for improving LLMs reliability as cyber advisory tools.

[35]  arXiv:2405.20443 [pdf, ps, other]
Title: P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion models and multi-scale features are essential components in semantic segmentation tasks that deal with remote-sensing images. They contribute to improved segmentation boundaries and offer significant contextual information. U-net-like architectures are frequently employed in diffusion models for segmentation tasks. These architectural designs include dense skip connections that may pose challenges for interpreting intermediate features. Consequently, they might not efficiently convey semantic information throughout various layers of the encoder-decoder architecture. To address these challenges, we propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches. This model consists of Parallel Multiscale Diffusion modules (P-MSDiff) and a Cross-Bridge Linear Attention mechanism (CBLA). P-MSDiff enhances the understanding of semantic information across multiple levels of granularity and detects repetitive distribution data through the integration of recursive denoising branches. It further facilitates the amalgamation of data by connecting relevant branches to the primary framework to enable concurrent denoising. Furthermore, within the interconnected transformer architecture, the LA module has been substituted with the CBLA module. This module integrates a semidefinite matrix linked to the query into the dot product computation of keys and values. This integration enables the adaptation of queries within the LA framework. This adjustment enhances the structure for multi-head attention computation, leading to enhanced network performance and CBLA is a plug-and-play module. Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets, showing improvements of 1.60% and 1.40% over strong baseline models, respectively.

[36]  arXiv:2405.20445 [pdf, other]
Title: GraphAny: A Foundation Model for Node Classification on Any Graph
Comments: Preprint. Work in progress
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Foundation models that can perform inference on any new task without requiring specific training have revolutionized machine learning in vision and language applications. However, applications involving graph-structured data remain a tough nut for foundation models, due to challenges in the unique feature- and label spaces associated with each graph. Traditional graph ML models such as graph neural networks (GNNs) trained on graphs cannot perform inference on a new graph with feature and label spaces different from the training ones. Furthermore, existing models learn functions specific to the training graph and cannot generalize to new graphs. In this work, we tackle these two challenges with a new foundational architecture for inductive node classification named GraphAny. GraphAny models inference on a new graph as an analytical solution to a LinearGNN, thereby solving the first challenge. To solve the second challenge, we learn attention scores for each node to fuse the predictions of multiple LinearGNNs. Specifically, the attention module is carefully parameterized as a function of the entropy-normalized distance-features between multiple LinearGNNs predictions to ensure generalization to new graphs. Empirically, GraphAny trained on the Wisconsin dataset with only 120 labeled nodes can effectively generalize to 30 new graphs with an average accuracy of 67.26\% in an inductive manner, surpassing GCN and GAT trained in the supervised regime, as well as other inductive baselines.

[37]  arXiv:2405.20446 [pdf, other]
Title: Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation
Comments: 7 pages, 3 figures
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Retrieval Augmented Generation (RAG) systems have shown great promise in natural language processing. However, their reliance on data stored in a retrieval database, which may contain proprietary or sensitive information, introduces new privacy concerns. Specifically, an attacker may be able to infer whether a certain text passage appears in the retrieval database by observing the outputs of the RAG system, an attack known as a Membership Inference Attack (MIA). Despite the significance of this threat, MIAs against RAG systems have yet remained under-explored. This study addresses this gap by introducing an efficient and easy-to-use method for conducting MIA against RAG systems. We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models, showing that the membership of a document in the retrieval database can be efficiently determined through the creation of an appropriate prompt in both black-box and gray-box settings. Our findings highlight the importance of implementing security countermeasures in deployed RAG systems to protect the privacy and security of retrieval databases.

[38]  arXiv:2405.20448 [pdf, other]
Title: Knockout: A simple way to handle missing inputs
Subjects: Machine Learning (cs.LG)

Deep learning models can tease out information from complex inputs. The richer inputs the better these models usually perform. However, models that leverage rich inputs (e.g. multi-sensor, multi-modality, multi-view) can be difficult to deployed widely because some inputs may be missing during deployment. Current popular solutions to this problem includes marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore is only feasible for low dimensional inputs. Imputation may result in mis-calibrated predictions because it approximates predictions using point estimates and does not work for high dimensional inputs (e.g. images). Training multiple models whereby each models take different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training multiple models is costly when models are built on top of foundational models. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions using partial inputs simultaneously using a single model and input mask-out. Input mask-out ensures that learning the marginal distributions does not interfere with learning the conditional distribution. Our approach is general and can be applied to both low- and high-dimensional inputs. We evaluate mask-out in several simulations to show that it can help a single model efficiently learns both conditional and marginal distributions. Experiment results multiple real-world datasets in both classification and segmentation demonstrates the utility of mask-out.

[39]  arXiv:2405.20449 [pdf, other]
Title: Optimization, guidance, and control of low-thrust transfers from the Lunar Gateway to low lunar orbit
Comments: 19 pages, 12 figures, IAC 2023, ACTA ASTRONAUTICA 2024
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC); Space Physics (physics.space-ph)

The Gateway will represent a primary space system useful for the Artemis program, Earth-Moon transportation, and deep space exploration. It is expected to serve as a staging location on the way to the lunar surface. This study focuses on low-thrust transfer dynamics, from the Near-Rectilinear Halo Orbit traveled by Gateway to a specified Low-altitude Lunar Orbit (LLO). This research addresses: (i) determination of the minimum-time low-thrust trajectory and (ii) design, implementation, and testing of a guidance and control architecture, for a space vehicle that travels from Gateway to LLO. Orbit dynamics is described in terms of modified equinoctial elements, in the context of a high-fidelity ephemeris model. The minimum-time trajectory from Gateway to a specified lunar orbit is detected through an indirect heuristic approach, which uses the analytical conditions arising in optimal control theory in conjunction with a heuristic technique. However, future missions will pursue a growing level of autonomy, and this circumstance implies the mandatory design of an efficient feedback guidance scheme, capable of compensating for nonnominal flight conditions. This research proposes nonlinear orbit control as a viable option for autonomous explicit guidance of low-thrust transfers from Gateway to LLO. This approach allows defining a feedback law that enjoys quasi-global stability properties without requiring any offline reference trajectory. The overall spacecraft dynamics is modeled including attitude control and actuation. The latter is demanded to an array of reaction wheels, arranged in a pyramidal configuration. Guidance, attitude control, and actuation are implemented in an iterative scheme. Monte Carlo simulations demonstrate that the guidance and control architecture is effective with random starting points from Gateway and the temporary unavailability of the propulsion system.

[40]  arXiv:2405.20450 [pdf, other]
Title: Decentralized AI: Permissionless LLM Inference on POKT Network
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

POKT Network's decentralized Remote Procedure Call (RPC) infrastructure, surpassing 740 billion requests since launching on MainNet in 2020, is well-positioned to extend into providing AI inference services with minimal design or implementation modifications. This litepaper illustrates how the network's open-source and permissionless design aligns incentives among model researchers, hardware operators, API providers and users whom we term model Sources, Suppliers, Gateways and Applications respectively. Through its Relay Mining algorithm, POKT creates a transparent marketplace where costs and earnings directly reflect cryptographically verified usage. This decentralized framework offers large model AI researchers a new avenue to disseminate their work and generate revenue without the complexities of maintaining infrastructure or building end-user products. Supply scales naturally with demand, as evidenced in recent years and the protocol's free market dynamics. POKT Gateways facilitate network growth, evolution, adoption, and quality by acting as application-facing load balancers, providing value-added features without managing LLM nodes directly. This vertically decoupled network, battle tested over several years, is set up to accelerate the adoption, operation, innovation and financialization of open-source models. It is the first mature permissionless network whose quality of service competes with centralized entities set up to provide application grade inference.

[41]  arXiv:2405.20452 [pdf, other]
Title: Understanding Encoder-Decoder Structures in Machine Learning Using Information Measures
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

We present new results to model and understand the role of encoder-decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss (MIL), to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder-decoder latent predictive structure. This result formally justifies the encoder-decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance (predictive expressiveness) could be lost, using the cross entropy risk, when a given encoder-decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder-decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder-decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon's information measures offer new interpretations and explanations for representation learning.

[42]  arXiv:2405.20455 [pdf, other]
Title: DepesRAG: Towards Managing Software Dependencies using Large Language Models
Subjects: Software Engineering (cs.SE)

Managing software dependencies is a crucial maintenance task in software development and is becoming a rapidly growing research field, especially in light of the significant increase in software supply chain attacks. Specialized expertise and substantial developer effort are required to fully comprehend dependencies and reveal hidden properties about the dependencies (e.g., number of dependencies, dependency chains, depth of dependencies).
Recent advancements in Large Language Models (LLMs) allow the retrieval of information from various data sources for response generation, thus providing a new opportunity to uniquely manage software dependencies. To highlight the potential of this technology, we present~\tool, a proof-of-concept Retrieval Augmented Generation (RAG) approach that constructs direct and transitive dependencies of software packages as a Knowledge Graph (KG) in four popular software ecosystems. DepsRAG can answer user questions about software dependencies by automatically generating necessary queries to retrieve information from the KG, and then augmenting the input of LLMs with the retrieved information. DepsRAG can also perform Web search to answer questions that the LLM cannot directly answer via the KG. We identify tangible benefits that DepsRAG can offer and discuss its limitations.

[43]  arXiv:2405.20456 [pdf, other]
Title: Scaling Laws for the Value of Individual Data Points in Machine Learning
Comments: ICML 2024 camera-ready
Subjects: Machine Learning (cs.LG)

Recent works have shown that machine learning models improve at a predictable rate with the total amount of training data, leading to scaling laws that describe the relationship between error and dataset size. These scaling laws can help design a model's training dataset, but they typically take an aggregate view of the data by only considering the dataset's size. We introduce a new perspective by investigating scaling behavior for the value of individual data points: we find that a data point's contribution to model's performance shrinks predictably with the size of the dataset in a log-linear manner. Interestingly, there is significant variability in the scaling exponent among different data points, indicating that certain points are more valuable in small datasets while others are relatively more useful as a part of large datasets. We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes. We further propose a maximum likelihood estimator and an amortized estimator to efficiently learn the individualized scaling behaviors from a small number of noisy observations per data point. Using our estimators, we provide insights into factors that influence the scaling behavior of different data points. Finally, we demonstrate applications of the individualized scaling laws to data valuation and data subset selection. Overall, our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.

[44]  arXiv:2405.20457 [pdf, other]
Title: Online network topology shapes personal narratives and hashtag generation
Comments: Will be published in the 2024 Proceedings of the Cognitive Science Society
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

While narratives have shaped cognition and cultures for centuries, digital media and online social networks have introduced new narrative phenomena. With increased narrative agency, networked groups of individuals can directly contribute and steer narratives that center our collective discussions of politics, science, and morality. We report the results of an online network experiment on narrative and hashtag generation, in which networked groups of participants interpreted a text-based narrative of a disaster event, and were incentivized to produce matching hashtags with their network neighbors. We found that network structure not only influences the emergence of dominant beliefs through coordination with network neighbors, but also impacts participants' use of causal language in their personal narratives.

[45]  arXiv:2405.20458 [pdf, other]
Title: Contingency-Aware Station-Keeping Control of Halo Orbits
Subjects: Systems and Control (eess.SY)

We present an algorithm to perform fuel-optimal stationkeeping for spacecraft in unstable halo orbits with additional constraints to ensure safety in the event of a control failure. We formulate a convex trajectory-optimization problem to generate impulsive spacecraft maneuvers to loosely track a halo orbit using a receding-horizon controller. Our solution also provides a safe exit strategy in the event that propulsion is lost at any point in the mission. We validate our algorithm in simulations of the three-body Earth-Moon and Saturn-Enceladus systems, demonstrating both low total delta-v and a safe contingency plan throughout the mission.

[46]  arXiv:2405.20459 [pdf, other]
Title: On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
Comments: 31 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Reliable usage of object detectors require them to be calibrated -- a crucial problem that requires careful attention. Recent approaches towards this involve (1) designing new loss functions to obtain calibrated detectors by training them from scratch, and (2) post-hoc Temperature Scaling (TS) that learns to scale the likelihood of a trained detector to output calibrated predictions. These approaches are then evaluated based on a combination of Detection Expected Calibration Error (D-ECE) and Average Precision. In this work, via extensive analysis and insights, we highlight that these recent evaluation frameworks, evaluation metrics, and the use of TS have notable drawbacks leading to incorrect conclusions. As a step towards fixing these issues, we propose a principled evaluation framework to jointly measure calibration and accuracy of object detectors. We also tailor efficient and easy-to-use post-hoc calibration approaches such as Platt Scaling and Isotonic Regression specifically for object detection task. Contrary to the common notion, our experiments show that once designed and evaluated properly, post-hoc calibrators, which are extremely cheap to build and use, are much more powerful and effective than the recent train-time calibration methods. To illustrate, D-DETR with our post-hoc Isotonic Regression calibrator outperforms the recent train-time state-of-the-art calibration method Cal-DETR by more than 7 D-ECE on the COCO dataset. Additionally, we propose improved versions of the recently proposed Localization-aware ECE and show the efficacy of our method on these metrics as well. Code is available at: https://github.com/fiveai/detection_calibration.

[47]  arXiv:2405.20461 [pdf, other]
Title: Scalable Detection of Salient Entities in News Articles
Subjects: Computation and Language (cs.CL)

News articles typically mention numerous entities, a large fraction of which are tangential to the story. Detecting the salience of entities in articles is thus important to applications such as news search, analysis and summarization. In this work, we explore new approaches for efficient and effective salient entity detection by fine-tuning pretrained transformer models with classification heads that use entity tags or contextualized entity representations directly. Experiments show that these straightforward techniques dramatically outperform prior work across datasets with varying sizes and salience definitions. We also study knowledge distillation techniques to effectively reduce the computational cost of these models without affecting their accuracy. Finally, we conduct extensive analyses and ablation experiments to characterize the behavior of the proposed models.

[48]  arXiv:2405.20462 [pdf, other]
Title: Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining
Comments: 16 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Self-supervised pretraining on large-scale satellite data has raised great interest in building Earth observation (EO) foundation models. However, many important resources beyond pure satellite imagery, such as land-cover-land-use products that provide free global semantic information, as well as vision foundation models that hold strong knowledge of the natural world, tend to be overlooked. In this work, we show these free additional resources not only help resolve common contrastive learning bottlenecks, but also significantly boost the efficiency and effectiveness of EO pretraining.
Specifically, we first propose soft contrastive learning that optimizes cross-scene soft similarity based on land-cover-generated multi-label supervision, naturally solving the issue of multiple positive samples and too strict positive matching in complex scenes. Second, we explore cross-domain continual pretraining for both multispectral and SAR imagery, building efficient EO foundation models from strongest vision models such as DINOv2. Integrating simple weight-initialization and Siamese masking strategies into our soft contrastive learning framework, we demonstrate impressive continual pretraining performance even when the input channels and modalities are not aligned.
Without prohibitive training, we produce multispectral and SAR foundation models that achieve significantly better results in 9 out of 10 downstream tasks than most existing SOTA models. For example, our ResNet50/ViT-S achieve 84.8/85.0 linear probing mAP scores on BigEarthNet-10\% which are better than most existing ViT-L models; under the same setting, our ViT-B sets a new record of 86.8 in multispectral, and 82.5 in SAR, the latter even better than many multispectral models. Dataset and models are available at https://github.com/zhu-xlab/softcon.

[49]  arXiv:2405.20465 [pdf, other]
Title: ENTIRe-ID: An Extensive and Diverse Dataset for Person Re-Identification
Comments: 5 pages, 2024 18th International Conference on Automatic Face and Gesture Recognition (FG)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The growing importance of person reidentification in computer vision has highlighted the need for more extensive and diverse datasets. In response, we introduce the ENTIRe-ID dataset, an extensive collection comprising over 4.45 million images from 37 different cameras in varied environments. This dataset is uniquely designed to tackle the challenges of domain variability and model generalization, areas where existing datasets for person re-identification have fallen short. The ENTIRe-ID dataset stands out for its coverage of a wide array of real-world scenarios, encompassing various lighting conditions, angles of view, and diverse human activities. This design ensures a realistic and robust training platform for ReID models. The ENTIRe-ID dataset is publicly available at https://serdaryildiz.github.io/ENTIRe-ID

[50]  arXiv:2405.20467 [pdf, ps, other]
Title: Performance of NPG in Countable State-Space Average-Cost RL
Comments: 23 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

We consider policy optimization methods in reinforcement learning settings where the state space is arbitrarily large, or even countably infinite. The motivation arises from control problems in communication networks, matching markets, and other queueing systems. We consider Natural Policy Gradient (NPG), which is a popular algorithm for finite state spaces. Under reasonable assumptions, we derive a performance bound for NPG that is independent of the size of the state space, provided the error in policy evaluation is within a factor of the true value function. We obtain this result by establishing new policy-independent bounds on the solution to Poisson's equation, i.e., the relative value function, and by combining these bounds with previously known connections between MDPs and learning from experts.

[51]  arXiv:2405.20468 [pdf, other]
Title: Extending the Massive Text Embedding Benchmark to French
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

In recent years, numerous embedding models have been made available and widely used for various NLP tasks. Choosing a model that performs well for several tasks in English has been largely simplified by the Massive Text Embedding Benchmark (MTEB), but extensions to other languages remain challenging. This is why we expand MTEB to propose the first massive benchmark of sentence embeddings for French. Not only we gather 22 existing datasets in an easy-to-use interface, but we also create three new French datasets for a global evaluation over 8 different tasks. We perform a large scale comparison with 46 carefully selected embedding models, conduct comprehensive statistical tests, and analyze the correlation between model performance and many of their characteristics. We find out that even if no model is the best on all tasks, large multilingual models pre-trained on sentence similarity perform particularly well. Our work comes with open-source code, new datasets and a public leaderboard.

[52]  arXiv:2405.20469 [pdf, other]
Title: Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images
Comments: Accepted at CVPR 2024 Workshop: SyntaGen-Harnessing Generative Models for Synthetic Visual Datasets. Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A long-standing challenge in developing machine learning approaches has been the lack of high-quality labeled data. Recently, models trained with purely synthetic data, here termed synthetic clones, generated using large-scale pre-trained diffusion models have shown promising results in overcoming this annotation bottleneck. As these synthetic clone models progress, they are likely to be deployed in challenging real-world settings, yet their suitability remains understudied. Our work addresses this gap by providing the first benchmark for three classes of synthetic clone models, namely supervised, self-supervised, and multi-modal ones, across a range of robustness measures. We show that existing synthetic self-supervised and multi-modal clones are comparable to or outperform state-of-the-art real-image baselines for a range of robustness metrics - shape bias, background bias, calibration, etc. However, we also find that synthetic clones are much more susceptible to adversarial and real-world noise than models trained with real data. To address this, we find that combining both real and synthetic data further increases the robustness, and that the choice of prompt used for generating synthetic images plays an important part in the robustness of synthetic clones.

[53]  arXiv:2405.20470 [pdf, other]
Title: STHN: Deep Homography Estimation for UAV Thermal Geo-localization with Satellite Imagery
Comments: 8 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Accurate geo-localization of Unmanned Aerial Vehicles (UAVs) is crucial for a variety of outdoor applications including search and rescue operations, power line inspections, and environmental monitoring. The vulnerability of Global Navigation Satellite Systems (GNSS) signals to interference and spoofing necessitates the development of additional robust localization methods for autonomous navigation. Visual Geo-localization (VG), leveraging onboard cameras and reference satellite maps, offers a promising solution for absolute localization. Specifically, Thermal Geo-localization (TG), which relies on image-based matching between thermal imagery with satellite databases, stands out by utilizing infrared cameras for effective night-time localization. However, the efficiency and effectiveness of current TG approaches, are hindered by dense sampling on satellite maps and geometric noises in thermal query images. To overcome these challenges, in this paper, we introduce STHN, a novel UAV thermal geo-localization approach that employs a coarse-to-fine deep homography estimation method. This method attains reliable thermal geo-localization within a 512-meter radius of the UAV's last known location even with a challenging 11% overlap between satellite and thermal images, despite the presence of indistinct textures in thermal imagery and self-similar patterns in both spectra. Our research significantly enhances UAV thermal geo-localization performance and robustness against the impacts of geometric noises under low-visibility conditions in the wild. The code will be made publicly available.

[54]  arXiv:2405.20471 [pdf, other]
Title: Equivalent External Noise Temperature of Time-Varying Receivers
Comments: 12 pages, 8 figures. Submitted to IEEE Transactions on Antennas and Propagation May 30, 2024
Subjects: Systems and Control (eess.SY)

The equivalent external noise temperature of time-varying antennas is studied using the concept of cross-frequency effective aperture, which quantifies the intermodulation conversion of external noise across the frequency spectrum into a receiver's operational bandwidth. The theoretical tools for this approach are laid out following the classical method for describing external noise temperature of linear time-invariant antennas, with generalizations made along the way to capture the effects of time-varying components or materials. The results demonstrate the specific ways that a time-varying system's noise characteristics are dependent on its cross-frequency effective aperture and the broadband noise environment. The general theory is applied to several examples, including abstract models of hypothetical systems, antennas integrated with parametric amplification, and time-modulated arrays.

[55]  arXiv:2405.20477 [pdf, other]
Title: Automated Focused Feedback Generation for Scientific Writing Assistance
Comments: Accepted to ACL 2024 (Findings)
Subjects: Computation and Language (cs.CL)

Scientific writing is a challenging task, particularly for novice researchers who often rely on feedback from experienced peers. Recent work has primarily focused on improving surface form and style rather than manuscript content. In this paper, we propose a novel task: automated focused feedback generation for scientific writing assistance. We present SWIF$^{2}$T: a Scientific WrIting Focused Feedback Tool. It is designed to generate specific, actionable and coherent comments, which identify weaknesses in a scientific paper and/or propose revisions to it. Our approach consists of four components - planner, investigator, reviewer and controller - leveraging multiple Large Language Models (LLMs) to implement them. We compile a dataset of 300 peer reviews citing weaknesses in scientific papers and conduct human evaluation. The results demonstrate the superiority in specificity, reading comprehension, and overall helpfulness of SWIF$^{2}$T's feedback compared to other approaches. In our analysis, we also identified cases where automatically generated reviews were judged better than human ones, suggesting opportunities for integration of AI-generated feedback in scientific writing.

[56]  arXiv:2405.20481 [pdf, other]
Title: On the randomized Euler scheme for SDEs with integral-form drift
Subjects: Numerical Analysis (math.NA); Probability (math.PR)

In this paper, we investigate the problem of strong approximation of the solution of SDEs in the case when the drift coefficient is given in the integral form. Such drift often appears when analyzing stochastic dynamics of optimization procedures in machine learning problems. We discuss connections of the defined randomized Euler approximation scheme with the perturbed version of the stochastic gradient descent (SGD) algorithm. We investigate its upper error bounds, in terms of the discretization parameter n and the size M of the random sample drawn at each step of the algorithm, in different subclasses of coefficients of the underlying SDE. Finally, the results of numerical experiments performed by using GPU architecture are also reported.

[57]  arXiv:2405.20482 [pdf, other]
Title: Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Many causal systems such as biological processes in cells can only be observed indirectly via measurements, such as gene expression. Causal representation learning -- the task of correctly mapping low-level observations to latent causal variables -- could advance scientific understanding by enabling inference of latent variables such as pathway activation. In this paper, we develop methods for inferring latent variables from multiple related datasets (environments) and tasks. As a running example, we consider the task of predicting a phenotype from gene expression, where we often collect data from multiple cell types or organisms that are related in known ways. The key insight is that the mapping from latent variables driven by gene expression to the phenotype of interest changes sparsely across closely related environments. To model sparse changes, we introduce Tree-Based Regularization (TBR), an objective that minimizes both prediction error and regularizes closely related environments to learn similar predictors. We prove that under assumptions about the degree of sparse changes, TBR identifies the true latent variables up to some simple transformations. We evaluate the theory empirically with both simulations and ground-truth gene expression data. We find that TBR recovers the latent causal variables better than related methods across these settings, even under settings that violate some assumptions of the theory.

[58]  arXiv:2405.20483 [pdf, other]
Title: Hiding Your Awful Online Choices Made More Efficient and Secure: A New Privacy-Aware Recommender System
Subjects: Cryptography and Security (cs.CR)

Recommender systems are an integral part of online platforms that recommend new content to users with similar interests. However, they demand a considerable amount of user activity data where, if the data is not adequately protected, constitute a critical threat to the user privacy. Privacy-aware recommender systems enable protection of such sensitive user data while still maintaining a similar recommendation accuracy compared to the traditional non-private recommender systems. However, at present, the current privacy-aware recommender systems suffer from a significant trade-off between privacy and computational efficiency. For instance, it is well known that architectures that rely purely on cryptographic primitives offer the most robust privacy guarantees, however, they suffer from substantial computational and network overhead. Thus, it is crucial to improve this trade-off for better performance. This paper presents a novel privacy-aware recommender system that combines privacy-aware machine learning algorithms for practical scalability and efficiency with cryptographic primitives like Homomorphic Encryption and Multi-Party Computation - without assumptions like trusted-party or secure hardware - for solid privacy guarantees. Experiments on standard benchmark datasets show that our approach results in time and memory gains by three orders of magnitude compared to using cryptographic primitives in a standalone for constructing a privacy-aware recommender system. Furthermore, for the first time our method makes it feasible to compute private recommendations for datasets containing 100 million entries, even on memory-constrained low-power SOC (System on Chip) devices.

[59]  arXiv:2405.20485 [pdf, other]
Title: Phantom: General Trigger Attacks on Retrieval Augmented Language Generation
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)

Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves impressive utility in many applications, its adoption to enable personalized generative models introduces new security risks. In this work, we propose new attack surfaces for an adversary to compromise a victim's RAG system, by injecting a single malicious document in its knowledge database. We design Phantom, general two-step attack framework against RAG augmented LLMs. The first step involves crafting a poisoned document designed to be retrieved by the RAG system within the top-k results only when an adversarial trigger, a specific sequence of words acting as backdoor, is present in the victim's queries. In the second step, a specially crafted adversarial string within the poisoned document triggers various adversarial attacks in the LLM generator, including denial of service, reputation damage, privacy violations, and harmful behaviors. We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama.

[60]  arXiv:2405.20486 [pdf, other]
Title: Policy Trees for Prediction: Interpretable and Adaptive Model Selection for Machine Learning
Comments: Submitted to JMLR on 5/30/2024
Subjects: Machine Learning (cs.LG)

As a multitude of capable machine learning (ML) models become widely available in forms such as open-source software and public APIs, central questions remain regarding their use in real-world applications, especially in high-stakes decision-making. Is there always one best model that should be used? When are the models likely to be error-prone? Should a black-box or interpretable model be used? In this work, we develop a prescriptive methodology to address these key questions, introducing a tree-based approach, Optimal Predictive-Policy Trees (OP2T), that yields interpretable policies for adaptively selecting a predictive model or ensemble, along with a parameterized option to reject making a prediction. We base our methods on learning globally optimized prescriptive trees. Our approach enables interpretable and adaptive model selection and rejection while only assuming access to model outputs. By learning policies over different feature spaces, including the model outputs, our approach works with both structured and unstructured datasets. We evaluate our approach on real-world datasets, including regression and classification tasks with both structured and unstructured data. We demonstrate that our approach provides both strong performance against baseline methods while yielding insights that help answer critical questions about which models to use, and when.

[61]  arXiv:2405.20487 [pdf, ps, other]
Title: Probabilities of Causation for Continuous and Vector Variables
Subjects: Artificial Intelligence (cs.AI)

Probabilities of causation (PoC) are valuable concepts for explainable artificial intelligence and practical decision-making. PoC are originally defined for scalar binary variables. In this paper, we extend the concept of PoC to continuous treatment and outcome variables, and further generalize PoC to capture causal effects between multiple treatments and multiple outcomes. In addition, we consider PoC for a sub-population and PoC with multi-hypothetical terms to capture more sophisticated counterfactual information useful for decision-making. We provide a nonparametric identification theorem for each type of PoC we introduce. Finally, we illustrate the application of our results on a real-world dataset about education.

[62]  arXiv:2405.20488 [pdf, other]
Title: Shoal++: High Throughput DAG BFT Can Be Fast!
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Today's practical partially synchronous Byzantine Fault Tolerant (BFT) consensus protocols trade off low latency and high throughput. On the one end, traditional BFT protocols such as PBFT and its derivatives optimize for latency. They require, in fault-free executions, only 3 message exchanges to commit, the optimum for BFT consensus. However, this class of protocols typically relies on a single leader, hampering throughput scalability. On the other end, a new class of so-called DAG-BFT protocols demonstrates how to achieve highly scalable throughput by separating data dissemination from consensus, and using every replica as proposer. Unfortunately, existing DAG-BFT protocols pay a steep latency premium, requiring on average 10.5 message exchanges to commit a transactions.
This work aims to soften this tension and proposes Shoal++, a novel DAG-based BFT consensus system that offers the throughput of DAGs while reducing commit latency to an average of 4.5 message exchanges. Our empirical findings are encouraging, showing that Shoal++ achieves throughput comparable to state-of-the-art DAG BFT solutions while reducing latency by up to 60%.

[63]  arXiv:2405.20489 [pdf, other]
Title: Stability-Constrained Learning for Frequency Regulation in Power Grids with Variable Inertia
Comments: This paper is to appear in IEEE Control System Letters (L-CSS)
Subjects: Systems and Control (eess.SY)

The increasing penetration of converter-based renewable generation has resulted in faster frequency dynamics, and low and variable inertia. As a result, there is a need for frequency control methods that are able to stabilize a disturbance in the power system at timescales comparable to the fast converter dynamics. This paper proposes a combined linear and neural network controller for inverter-based primary frequency control that is stable at time-varying levels of inertia. We model the time-variance in inertia via a switched affine hybrid system model. We derive stability certificates for the proposed controller via a quadratic candidate Lyapunov function. We test the proposed control on a 12-bus 3-area test network, and compare its performance with a base case linear controller, optimized linear controller, and finite-horizon Linear Quadratic Regulator (LQR). Our proposed controller achieves faster mean settling time and over 50% reduction in average control cost across $100$ inertia scenarios compared to the optimized linear controller. Unlike LQR which requires complete knowledge of the inertia trajectories and system dynamics over the entire control time horizon, our proposed controller is real-time tractable, and achieves comparable performance to LQR.

[64]  arXiv:2405.20494 [pdf, other]
Title: Slight Corruption in Pre-training Data Makes Better Diffusion Models
Comments: 50 pages, 33 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-quality images, audios, and videos. They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs. Despite rigorous filtering, these pre-training datasets often inevitably contain corrupted pairs where conditions do not accurately describe the data. This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs. We synthetically corrupt ImageNet-1K and CC3M to pre-train and evaluate over 50 conditional DMs. Our empirical findings reveal that various types of slight corruption in pre-training can significantly enhance the quality, diversity, and fidelity of the generated images across different DMs, both during pre-training and downstream adaptation stages. Theoretically, we consider a Gaussian mixture model and prove that slight corruption in the condition leads to higher entropy and a reduced 2-Wasserstein distance to the ground truth of the data distribution generated by the corruptly trained DMs. Inspired by our analysis, we propose a simple method to improve the training of DMs on practical datasets by adding condition embedding perturbations (CEP). CEP significantly improves the performance of various DMs in both pre-training and downstream tasks. We hope that our study provides new insights into understanding the data and pre-training processes of DMs.

[65]  arXiv:2405.20495 [pdf, other]
Title: Transfer Q Star: Principled Decoding for LLM Alignment
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ($Q^*$), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{\pi_{\texttt{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer $Q^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $\rho_{\texttt{BL}}$ aligned with a baseline reward $\rho_{\texttt{BL}}$ (which can be different from the target reward $r$). Theoretical analyses of Transfer $Q^*$ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference $\texttt{SFT}$ model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets.

[66]  arXiv:2405.20496 [pdf, other]
Title: Investigations into Uncertain Control Co-Design Implementations for stochastic in expectation and worst-case robust
Comments: 16 pages and 8 figures
Subjects: Systems and Control (eess.SY)

As uncertainty considerations become increasingly important aspects of concurrent plant and control optimization, it is imperative to identify and compare the impact of uncertain control co-design (UCCD) formulations on their associated solutions. While previous work has developed the theory for various UCCD formulations, their implementation, along with an in-depth discussion of the structure of UCCD problems, implicit assumptions, method-dependent considerations, and practical insights, is currently missing from the literature. Therefore, in this study, we address some of these limitations by proposing two optimal control structures for UCCD problems that we refer to as the open-loop single-control (OLSC) and open-loop multiple-control (OLMC). Next, we implement the stochastic in expectation UCCD (SE-UCCD) and worst-case robust UCCD (WCR-UCCD) for a simplified strain-actuated solar array (SASA) case study. For the implementation of SE-UCCD, we use generalized Polynomial Chaos expansion and benchmark the results against Monte Carlo Simulation. Next, we solve a simple SASA WCR-UCCD through OLSC and OLMC structures. Insights from such implementations indicate that constructing, implementing, and solving a UCCD problem requires an in-depth understanding of the problem at hand, formulations, and solution strategies to best address the underlying co-design under uncertainty questions.

[67]  arXiv:2405.20501 [pdf, other]
Title: ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane
Comments: 8 pages, 14 figures and charts
Journal-ref: In AAMAS (pp. 1514-1523) 2023
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

The ability to shop independently, especially in grocery stores, is important for maintaining a high quality of life. This can be particularly challenging for people with visual impairments (PVI). Stores carry thousands of products, with approximately 30,000 new products introduced each year in the US market alone, presenting a challenge even for modern computer vision solutions. Through this work, we present a proof-of-concept socially assistive robotic system we call ShelfHelp, and propose novel technical solutions for enhancing instrumented canes traditionally meant for navigation tasks with additional capability within the domain of shopping. ShelfHelp includes a novel visual product locator algorithm designed for use in grocery stores and a novel planner that autonomously issues verbal manipulation guidance commands to guide the user during product retrieval. Through a human subjects study, we show the system's success in locating and providing effective manipulation guidance to retrieve desired products with novice users. We compare two autonomous verbal guidance modes achieving comparable performance to a human assistance baseline and present encouraging findings that validate our system's efficiency and effectiveness and through positive subjective metrics including competence, intelligence, and ease of use.

[68]  arXiv:2405.20502 [pdf, ps, other]
Title: Reach-Avoid Control Synthesis for a Quadrotor UAV with Formal Safety Guarantees
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)

Reach-avoid specifications are one of the most common tasks in autonomous aerial vehicle (UAV) applications. Despite the intensive research and development associated with control of aerial vehicles, generating feasible trajectories though complex environments and tracking them with formal safety guarantees remain challenging. In this paper, we propose a control framework for a quadrotor UAV that enables accomplishing reach-avoid tasks with formal safety guarantees. In this proposed framework, we integrate geometric control theory for tracking and polynomial trajectory generation using Bezier curves, where tracking errors are accounted for in the trajectory synthesis process. To estimate the tracking errors, we revisit the stability analysis of the closed-loop quadrotor system, when geometric control is implemented. We show that the tracking error dynamics exhibit local exponential stability when geometric control is implemented with any positive control gains, and we derive tight uniform bounds of the tracking error. We also introduce sufficient conditions to be imposed on the desired trajectory utilizing the derived uniform bounds to ensure the well-definedness of the closed-loop system. For the trajectory synthesis, we present an efficient algorithm that enables constructing a safe tube by means of sampling-based planning and safe hyper-rectangular set computations. Then, we compute the trajectory, given as a piecewise continuous Bezier curve, through the safe tube, where a heuristic efficient approach that utilizes iterative linear programming is employed. We present extensive numerical simulations with a cluttered environment to illustrate the effectiveness of the proposed framework in reach-avoid planning scenarios.

[69]  arXiv:2405.20503 [pdf, ps, other]
Title: Optimizing cnn-Bigru performance: Mish activation and comparative analysis with Relu
Journal-ref: International Journal of Computer Networks & Communications (IJCNC) Vol.16, No.3, May 2024
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Deep learning is currently extensively employed across a range of research domains. The continuous advancements in deep learning techniques contribute to solving intricate challenges. Activation functions (AF) are fundamental components within neural networks, enabling them to capture complex patterns and relationships in the data. By introducing non-linearities, AF empowers neural networks to model and adapt to the diverse and nuanced nature of real-world data, enhancing their ability to make accurate predictions across various tasks. In the context of intrusion detection, the Mish, a recent AF, was implemented in the CNN-BiGRU model, using three datasets: ASNM-TUN, ASNM-CDX, and HOGZILLA. The comparison with Rectified Linear Unit (ReLU), a widely used AF, revealed that Mish outperforms ReLU, showcasing superior performance across the evaluated datasets. This study illuminates the effectiveness of AF in elevating the performance of intrusion detection systems.

[70]  arXiv:2405.20504 [pdf, other]
Title: FCOM: A Federated Collaborative Online Monitoring Framework via Representation Learning
Subjects: Machine Learning (cs.LG)

Online learning has demonstrated notable potential to dynamically allocate limited resources to monitor a large population of processes, effectively balancing the exploitation of processes yielding high rewards, and the exploration of uncertain processes. However, most online learning algorithms were designed under 1) a centralized setting that requires data sharing across processes to obtain an accurate prediction or 2) a homogeneity assumption that estimates a single global model from the decentralized data. To facilitate the online learning of heterogeneous processes from the decentralized data, we propose a federated collaborative online monitoring method, which captures the latent representative models inherent in the population through representation learning and designs a novel federated collaborative UCB algorithm to estimate the representative models from sequentially observed decentralized data. The efficiency of our method is illustrated through theoretical analysis, simulation studies, and decentralized cognitive degradation monitoring in Alzheimer's disease.

[71]  arXiv:2405.20505 [pdf, other]
Title: SPOT: Text Source Prediction from Originality Score Thresholding
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

The wide acceptance of large language models (LLMs) has unlocked new applications and social risks. Popular countermeasures aim at detecting misinformation, usually involve domain specific models trained to recognize the relevance of any information. Instead of evaluating the validity of the information, we propose to investigate LLM generated text from the perspective of trust. In this study, we define trust as the ability to know if an input text was generated by a LLM or a human. To do so, we design SPOT, an efficient method, that classifies the source of any, standalone, text input based on originality score. This score is derived from the prediction of a given LLM to detect other LLMs. We empirically demonstrate the robustness of the method to the architecture, training data, evaluation data, task and compression of modern LLMs.

[72]  arXiv:2405.20508 [pdf, other]
Title: MyWeekInSight: Designing and Evaluating the Use of Visualization in Self-Management of Chronic Pain by Youth
Subjects: Human-Computer Interaction (cs.HC)

A teenager's experience of chronic pain reverberates through multiple interacting aspects of their lives. To self-manage their symptoms, they need to understand how factors such as their sleep, social interactions, emotions and pain intersect; supporting this capability must underlie an effective personalized healthcare solution. While adult use of personal informatics for self-management of various health factors has been studied, solutions intended for adults are rarely workable for teens, who face this complex and confusing situation with unique perspectives, skills and contexts. In this design study, we explore a means of facilitating self-reflection by youth living with chronic pain, through visualization of their personal health data. In collaboration with pediatric chronic pain clinicians and a health-tech industry partner, we designed and deployed MyWeekInSight, a visualization-based self-reflection tool for youth with chronic pain. We discuss our staged design approach with this intersectionally vulnerable population, in which we balanced reliance on proxy users and data with feedback from youth viewing their own data. We report on extensive formative and in-situ evaluation, including a three-week clinical deployment, and present a framework of challenges and barriers faced in clinical deployment with mitigations that can aid fellow researchers. Our reflections on the design process yield principles, surprises, and open questions.

[73]  arXiv:2405.20509 [pdf, ps, other]
Title: An FBG-based Stiffness Estimation Sensor for In-vivo Diagnostics
Comments: 6 pages (excluding the references), 5 figures
Subjects: Robotics (cs.RO)

In-vivo tissue stiffness identification can be useful in pulmonary fibrosis diagnostics and minimally invasive tumor identification, among many other applications. In this work, we propose a palpation-based method for tissue stiffness estimation that uses a sensorized beam buckled onto the surface of a tissue. Fiber Bragg Gratings (FBGs) are used in our sensor as a shape-estimation modality to get real-time beam shape, even while the device is not visually monitored. A mechanical model is developed to predict the behavior of a buckling beam and is validated using finite element analysis and bench-top testing with phantom tissue samples (made of PDMS and PA-Gel). Bench-top estimations were conducted and the results were compared with the actual stiffness values. Mean RMSE and standard deviation (from the actual stiffnesses) values of 413.86 KPa and 313.82 KPa were obtained. Estimations for softer samples were relatively closer to the actual values. Ultimately, we used the stiffness sensor within a mock concentric tube robot as a demonstration of \textit{in-vivo} sensor feasibility. Bench-top trials with and without the robot demonstrate the effectiveness of this unique sensing modality in \textit{in-vivo} applications.

[74]  arXiv:2405.20510 [pdf, other]
Title: Physically Compatible 3D Object Modeling from a Single Image
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a computational framework that transforms single images into 3D physical objects. The visual geometry of a physical object in an image is determined by three orthogonal attributes: mechanical properties, external forces, and rest-shape geometry. Existing single-view 3D reconstruction methods often overlook this underlying composition, presuming rigidity or neglecting external forces. Consequently, the reconstructed objects fail to withstand real-world physical forces, resulting in instability or undesirable deformation -- diverging from their intended designs as depicted in the image. Our optimization framework addresses this by embedding physical compatibility into the reconstruction process. We explicitly decompose the three physical attributes and link them through static equilibrium, which serves as a hard constraint, ensuring that the optimized physical shapes exhibit desired physical behaviors. Evaluations on a dataset collected from Objaverse demonstrate that our framework consistently enhances the physical realism of 3D models over existing methods. The utility of our framework extends to practical applications in dynamic simulations and 3D printing, where adherence to physical compatibility is paramount.

[75]  arXiv:2405.20512 [pdf, other]
Title: How Multilingual Are Large Language Models Fine-Tuned for Translation?
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

A new paradigm for machine translation has recently emerged: fine-tuning large language models (LLM) on parallel text has been shown to outperform dedicated translation systems trained in a supervised fashion on much larger amounts of parallel data (Xu et al., 2024a; Alves et al., 2024). However, it remains unclear whether this paradigm can enable massively multilingual machine translation or whether it requires fine-tuning dedicated models for a small number of language pairs. How does translation fine-tuning impact the MT capabilities of LLMs for zero-shot languages, zero-shot language pairs, and translation tasks that do not involve English? To address these questions, we conduct an extensive empirical evaluation of the translation quality of the TOWER family of language models (Alves et al., 2024) on 132 translation tasks from the multi-parallel FLORES-200 data. We find that translation fine-tuning improves translation quality even for zero-shot languages on average, but that the impact is uneven depending on the language pairs involved. These results call for further research to effectively enable massively multilingual translation with LLMs.

[76]  arXiv:2405.20513 [pdf, other]
Title: Deep Modeling of Non-Gaussian Aleatoric Uncertainty
Comments: 8 pages, 7 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Deep learning offers promising new ways to accurately model aleatoric uncertainty in robotic estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deep learning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric uncertainty: parametric, discretized, and generative modeling. We systematically compare the respective strengths and weaknesses of these three methods on simulated non-Gaussian densities as well as on real-world terrain-relative navigation data. Our results show that these deep learning methods can accurately capture complex uncertainty patterns, highlighting their potential for improving the reliability and robustness of estimation systems.

[77]  arXiv:2405.20516 [pdf, other]
Title: WaveCastNet: An AI-enabled Wavefield Forecasting Framework for Earthquake Early Warning
Subjects: Machine Learning (cs.LG); Geophysics (physics.geo-ph)

Large earthquakes can be destructive and quickly wreak havoc on a landscape. To mitigate immediate threats, early warning systems have been developed to alert residents, emergency responders, and critical infrastructure operators seconds to a minute before seismic waves arrive. These warnings provide time to take precautions and prevent damage. The success of these systems relies on fast, accurate predictions of ground motion intensities, which is challenging due to the complex physics of earthquakes, wave propagation, and their intricate spatial and temporal interactions. To improve early warning, we propose a novel AI-enabled framework, WaveCastNet, for forecasting ground motions from large earthquakes. WaveCastNet integrates a novel convolutional Long Expressive Memory (ConvLEM) model into a sequence to sequence (seq2seq) forecasting framework to model long-term dependencies and multi-scale patterns in both space and time. WaveCastNet, which shares weights across spatial and temporal dimensions, requires fewer parameters compared to more resource-intensive models like transformers and thus, in turn, reduces inference times. Importantly, WaveCastNet also generalizes better than transformer-based models to different seismic scenarios, including to more rare and critical situations with higher magnitude earthquakes. Our results using simulated data from the San Francisco Bay Area demonstrate the capability to rapidly predict the intensity and timing of destructive ground motions. Importantly, our proposed approach does not require estimating earthquake magnitudes and epicenters, which are prone to errors using conventional approaches; nor does it require empirical ground motion models, which fail to capture strongly heterogeneous wave propagation effects.

[78]  arXiv:2405.20519 [pdf, other]
Title: Diffusion On Syntax Trees For Program Synthesis
Comments: this https URL
Subjects: Artificial Intelligence (cs.AI)

Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. Similar to image diffusion models, our method also inverts ``noise'' applied to syntax trees. Rather than generating code sequentially, we iteratively edit it while preserving syntactic validity, which makes it easy to combine this neural model with search. We apply our approach to inverse graphics tasks, where our model learns to convert images into programs that produce those images. Combined with search, our model is able to write graphics programs, see the execution result, and debug them to meet the required specifications. We additionally show how our system can write graphics programs for hand-drawn sketches.

[79]  arXiv:2405.20521 [pdf, other]
Title: SoK: Public Blockchain Sharding
Comments: 18 pages
Subjects: Cryptography and Security (cs.CR)

Blockchain's decentralization, transparency, and tamper-resistance properties have facilitated the system's use in various application fields. However, the low throughput and high confirmation latency hinder the widespread adoption of Blockchain. Many solutions have been proposed to address these issues, including first-layer solutions (or on-chain solutions) and second-layer solutions (or off-chain solutions). Among the proposed solutions, the blockchain sharding system is the most scalable one, where the nodes in the network are divided into several groups. The nodes in different shards work in parallel to validate the transactions and add them to the blocks, and in such a way, the throughput increases significantly. However, previous works have not adequately summarized the latest achievements in blockchain sharding, nor have they fully showcased its state-of-the-art. Our study provides a systemization of knowledge of public blockchain sharding, including the core components of sharding systems, challenges, limitations, and mechanisms of the latest sharding protocols. We also compare their performance and discuss current constraints and future research directions.

[80]  arXiv:2405.20524 [pdf, other]
Title: Practical implementation of geometric quasi-cyclic LDPC codes
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

We detail for the first time a complete explicit description of the quasi-cyclic structure of all classical finite generalized quadrangles. Using these descriptions we construct families of quasi-cyclic LDPC codes derived from the point-line incidence matrix of the quadrangles by explicitly calculating quasi-cyclic generator and parity check matrices for these codes. This allows us to construct parity check and generator matrices of all such codes of length up to 400000. These codes cover a wide range of transmission rates, are easy and fast to implement and perform close to Shannon's limit with no visible error floors. We also include some performance data for these codes. Furthermore, we include a complete explicit description of the quasi-cyclic structure of the point-line and point-hyperplane incidences of the finite projective and affine spaces.

[81]  arXiv:2405.20525 [pdf, other]
Title: Comparing Quantum Annealing and Spiking Neuromorphic Computing for Sampling Binary Sparse Coding QUBO Problems
Subjects: Emerging Technologies (cs.ET); Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)

We consider the problem of computing a sparse binary representation of an image. To be precise, given an image and an overcomplete, non-orthonormal basis, we aim to find a sparse binary vector indicating the minimal set of basis vectors that when added together best reconstruct the given input. We formulate this problem with an $L_2$ loss on the reconstruction error, and an $L_0$ (or, equivalently, an $L_1$) loss on the binary vector enforcing sparsity. This yields a quadratic binary optimization problem (QUBO), whose optimal solution(s) in general is NP-hard to find. The method of unsupervised and unnormalized dictionary feature learning for a desired sparsity level to best match the data is presented. Next, we solve the sparse representation QUBO by implementing it both on a D-Wave quantum annealer with Pegasus chip connectivity via minor embedding, as well as on the Intel Loihi 2 spiking neuromorphic processor. On the quantum annealer, we sample from the sparse representation QUBO using parallel quantum annealing combined with quantum evolution Monte Carlo, also known as iterated reverse annealing. On Loihi 2, we use a stochastic winner take all network of neurons. The solutions are benchmarked against simulated annealing, a classical heuristic, and the optimal solutions are computed using CPLEX. Iterated reverse quantum annealing performs similarly to simulated annealing, although simulated annealing is always able to sample the optimal solution whereas quantum annealing was not always able to. The Loihi 2 solutions that are sampled are on average more sparse than the solutions from any of the other methods. Loihi 2 outperforms a D-Wave quantum annealer standard linear-schedule anneal, while iterated reverse quantum annealing performs much better than both unmodified linear-schedule quantum annealing and iterated warm starting on Loihi 2.

[82]  arXiv:2405.20526 [pdf, ps, other]
Title: Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions
Comments: Learning @ Scale 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Knowledge Components (KCs) linked to assessments enhance the measurement of student learning, enrich analytics, and facilitate adaptivity. However, generating and linking KCs to assessment items requires significant effort and domain-specific knowledge. To streamline this process for higher-education courses, we employed GPT-4 to generate KCs for multiple-choice questions (MCQs) in Chemistry and E-Learning. We analyzed discrepancies between the KCs generated by the Large Language Model (LLM) and those made by humans through evaluation from three domain experts in each subject area. This evaluation aimed to determine whether, in instances of non-matching KCs, evaluators showed a preference for the LLM-generated KCs over their human-created counterparts. We also developed an ontology induction algorithm to cluster questions that assess similar KCs based on their content. Our most effective LLM strategy accurately matched KCs for 56% of Chemistry and 35% of E-Learning MCQs, with even higher success when considering the top five KC suggestions. Human evaluators favored LLM-generated KCs, choosing them over human-assigned ones approximately two-thirds of the time, a preference that was statistically significant across both domains. Our clustering algorithm successfully grouped questions by their underlying KCs without needing explicit labels or contextual information. This research advances the automation of KC generation and classification for assessment items, alleviating the need for student data or predefined KC labels.

[83]  arXiv:2405.20527 [pdf, other]
Title: Towards Ontology-Enhanced Representation Learning for Large Language Models
Comments: 14 pages, 1 figure
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Taking advantage of the widespread use of ontologies to organise and harmonize knowledge across several distinct domains, this paper proposes a novel approach to improve an embedding-Large Language Model (embedding-LLM) of interest by infusing the knowledge formalized by a reference ontology: ontological knowledge infusion aims at boosting the ability of the considered LLM to effectively model the knowledge domain described by the infused ontology. The linguistic information (i.e. concept synonyms and descriptions) and structural information (i.e. is-a relations) formalized by the ontology are utilized to compile a comprehensive set of concept definitions, with the assistance of a powerful generative LLM (i.e. GPT-3.5-turbo). These concept definitions are then employed to fine-tune the target embedding-LLM using a contrastive learning framework. To demonstrate and evaluate the proposed approach, we utilize the biomedical disease ontology MONDO. The results show that embedding-LLMs enhanced by ontological disease knowledge exhibit an improved capability to effectively evaluate the similarity of in-domain sentences from biomedical documents mentioning diseases, without compromising their out-of-domain performance.

[84]  arXiv:2405.20529 [pdf, ps, other]
Title: An Automatic Question Usability Evaluation Toolkit
Comments: Artificial Intelligence in Education 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. By harnessing the latest in large language models such as GPT-4, advanced word embeddings, and Transformers designed to analyze textual complexity, SAQUET effectively pinpoints and assesses a wide array of flaws in MCQs. We first demonstrate the discrepancy between commonly used automated evaluation metrics and the human assessment of MCQ quality. Then we evaluate SAQUET on a diverse dataset of MCQs across the five domains of Chemistry, Statistics, Computer Science, Humanities, and Healthcare, showing how it effectively distinguishes between flawed and flawless questions, providing a level of analysis beyond what is achievable with traditional metrics. With an accuracy rate of over 94% in detecting the presence of flaws identified by human evaluators, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.

[85]  arXiv:2405.20530 [pdf, ps, other]
Title: Impact of Connected and Automated Vehicles on Transport Injustices
Subjects: Human-Computer Interaction (cs.HC)

Connected and automated vehicles are poised to transform the transport system. However, significant uncertainties remain about their impact, particularly regarding concerns that this advanced technology might exacerbate injustices, such as safety disparities for vulnerable road users. Therefore, understanding the potential conflicts of this technology with societal values such as justice and safety is crucial for responsible implementation. To date, no research has focused on what safety and justice in transport mean in the context of CAV deployment and how the potential benefits of CAVs can be harnessed without exacerbating the existing vulnerabilities and injustices VRUs face. This paper addresses this gap by exploring car drivers' and pedestrians' perceptions of safety and justice issues that CAVs might exacerbate using an existing theoretical framework. Employing a qualitative approach, the study delves into the nuanced aspects of these concepts. Interviews were conducted with 30 participants aged between 18 and 79 in Queensland, Australia. These interviews were recorded, transcribed, organised, and analysed using reflexive thematic analysis. Three main themes emerged from the participants' discussions: CAVs as a safety problem for VRUs, CAVs as a justice problem for VRUs, and CAVs as an alignment with societal values problem. Participants emphasised the safety challenges CAVs pose for VRUs, highlighting the need for thorough evaluation and regulatory oversight. Concerns were also raised about CAVs potentially marginalising vulnerable groups within society. Participants advocated for inclusive discussions and a justice-oriented approach to designing a comprehensive transport system to address these concerns.

[86]  arXiv:2405.20531 [pdf, ps, other]
Title: Mitigating the Impact of Labeling Errors on Training via Rockafellian Relaxation
Subjects: Machine Learning (cs.LG)

Labeling errors in datasets are common, if not systematic, in practice. They naturally arise in a variety of contexts-human labeling, noisy labeling, and weak labeling (i.e., image classification), for example. This presents a persistent and pervasive stress on machine learning practice. In particular, neural network (NN) architectures can withstand minor amounts of dataset imperfection with traditional countermeasures such as regularization, data augmentation, and batch normalization. However, major dataset imperfections often prove insurmountable. We propose and study the implementation of Rockafellian Relaxation (RR), a new loss reweighting, architecture-independent methodology, for neural network training. Experiments indicate RR can enhance standard neural network methods to achieve robust performance across classification tasks in computer vision and natural language processing (sentiment analysis). We find that RR can mitigate the effects of dataset corruption due to both (heavy) labeling error and/or adversarial perturbation, demonstrating effectiveness across a variety of data domains and machine learning tasks.

[87]  arXiv:2405.20534 [pdf, other]
Title: Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

An exciting and promising frontier for Deep Reinforcement Learning (DRL) is its application to real-world robotic systems. While modern DRL approaches achieved remarkable successes in many robotic scenarios (including mobile robotics, surgical assistance, and autonomous driving) unpredictable and non-stationary environments can pose critical challenges to such methods. These features can significantly undermine fundamental requirements for a successful training process, such as the Markovian properties of the transition model. To address this challenge, we propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and DRL. In more detail, we show that our benchmarking environment is problematic even for state-of-the-art DRL approaches that may struggle to generate reliable policies in terms of generalization power and safety. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques (such as curriculum learning and learnable hyperparameters). Our extensive empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results. Our simulation environment and training baselines are freely available to facilitate further research on this open problem and encourage collaboration in the field.

[88]  arXiv:2405.20535 [pdf, other]
Title: Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Instruction Fine-Tuning (IFT) significantly enhances the zero-shot capabilities of pretrained Large Language Models (LLMs). While coding data is known to boost reasoning abilities during LLM pretraining, its role in activating internal reasoning capacities during IFT remains understudied. This paper investigates a key question: How does coding data impact LLMs' reasoning capacities during the IFT stage? To explore this, we thoroughly examine the impact of coding data across different coding data proportions, model families, sizes, and reasoning domains, from various perspectives. Specifically, we create three IFT datasets with increasing coding data proportions, fine-tune six LLM backbones across different families and scales on these datasets, evaluate the tuned models' performance across twelve tasks in three reasoning domains, and analyze the outcomes from three broad-to-granular perspectives: overall, domain-level, and task-specific. Our holistic analysis provides valuable insights in each perspective. First, coding data tuning enhances the overall reasoning capabilities of LLMs across different model families and scales. Moreover, the effect of coding data varies among different domains but shows consistent trends across model families and scales within each domain. Additionally, coding data generally yields comparable task-specific benefits across different model families, with the optimal coding data proportions in IFT datasets being task-specific.

[89]  arXiv:2405.20538 [pdf, other]
Title: Q-learning as a monotone scheme
Authors: Lingyi Yang
Subjects: Machine Learning (cs.LG)

Stability issues with reinforcement learning methods persist. To better understand some of these stability and convergence issues involving deep reinforcement learning methods, we examine a simple linear quadratic example. We interpret the convergence criterion of exact Q-learning in the sense of a monotone scheme and discuss consequences of function approximation on monotonicity properties.

[90]  arXiv:2405.20539 [pdf, other]
Title: SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents
Comments: 23 pages, 14 figures, NeurIPS
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop ``SleeperNets'' as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic reward poisoning techniques. We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.

[91]  arXiv:2405.20540 [pdf, ps, other]
Title: Fully Unconstrained Online Learning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or $\|w_\star\|$. Importantly, this matches the optimal bound $G\|w_\star\|\sqrt{T}$ available with such knowledge (up to logarithmic factors), unless either $\|w_\star\|$ or $G$ is so large that even $G\|w_\star\|\sqrt{T}$ is roughly linear in $T$. Thus, it matches the optimal bound in all cases in which one can achieve sublinear regret, which arguably most "interesting" scenarios.

[92]  arXiv:2405.20541 [pdf, other]
Title: Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected by the domain composition of the data being pruned. We demonstrate that for multiple dataset compositions, perplexity-based pruning of pretraining data can \emph{significantly} improve downstream task performance: pruning based on perplexities computed with a 125 million parameter model improves the average performance on downstream tasks of a 3 billion parameter model by up to 2.04 and achieves up to a $1.45\times$ reduction in pretraining steps to reach commensurate baseline performance. Furthermore, we demonstrate that such perplexity-based data pruning also yields downstream performance gains in the over-trained and data-constrained regimes.

[93]  arXiv:2405.20542 [pdf, ps, other]
Title: On the Connection Between Non-negative Matrix Factorization and Latent Dirichlet Allocation
Comments: 9 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Non-negative matrix factorization with the generalized Kullback-Leibler divergence (NMF) and latent Dirichlet allocation (LDA) are two popular approaches for dimensionality reduction of non-negative data. Here, we show that NMF with $\ell_1$ normalization constraints on the columns of both matrices of the decomposition and a Dirichlet prior on the columns of one matrix is equivalent to LDA. To show this, we demonstrate that explicitly accounting for the scaling ambiguity of NMF by adding $\ell_1$ normalization constraints to the optimization problem allows a joint update of both matrices in the widely used multiplicative updates (MU) algorithm. When both of the matrices are normalized, the joint MU algorithm leads to probabilistic latent semantic analysis (PLSA), which is LDA without a Dirichlet prior. Our approach of deriving joint updates for NMF also reveals that a Lasso penalty on one matrix together with an $\ell_1$ normalization constraint on the other matrix is insufficient to induce any sparsity.

[94]  arXiv:2405.20543 [pdf, other]
Title: Towards a General GNN Framework for Combinatorial Optimization
Comments: 15 pages, 1 figure
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM)

Graph neural networks (GNNs) have achieved great success for a variety of tasks such as node classification, graph classification, and link prediction. However, the use of GNNs (and machine learning more generally) to solve combinatorial optimization (CO) problems is much less explored. Here, we introduce a novel GNN architecture which leverages a complex filter bank and localized attention mechanisms designed to solve CO problems on graphs. We show how our method differentiates itself from prior GNN-based CO solvers and how it can be effectively applied to the maximum clique, minimum dominating set, and maximum cut problems in a self-supervised learning setting. In addition to demonstrating competitive overall performance across all tasks, we establish state-of-the-art results for the max cut problem.

[95]  arXiv:2405.20549 [pdf, other]
Title: Discrete-Time Implementation of Explicit Reference Governor
Subjects: Systems and Control (eess.SY)

Explicit reference governor (ERG) is an add-on unit that provides constraint handling capability to pre-stabilized systems. The main idea behind ERG is to manipulate the derivative of the applied reference in continuous time such that the satisfaction of state and input constraints is guaranteed at all times. However, ERG should be practically implemented in discrete-time. This paper studies the discrete-time implementation of ERG, and provides conditions under which the feasibility and convergence properties of the ERG framework are maintained when the updates of the applied reference are performed in discrete time. The proposed approach is validated via extensive simulation and experimental studies.

[96]  arXiv:2405.20550 [pdf, ps, other]
Title: Uncertainty Quantification for Deep Learning
Comments: 25 pages 4 figures, submitted to Environmental data Science
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A complete and statistically consistent uncertainty quantification for deep learning is provided, including the sources of uncertainty arising from (1) the new input data, (2) the training and testing data (3) the weight vectors of the neural network, and (4) the neural network because it is not a perfect predictor. Using Bayes Theorem and conditional probability densities, we demonstrate how each uncertainty source can be systematically quantified. We also introduce a fast and practical way to incorporate and combine all sources of errors for the first time. For illustration, the new method is applied to quantify errors in cloud autoconversion rates, predicted from an artificial neural network that was trained by aircraft cloud probe measurements in the Azores and the stochastic collection equation formulated as a two-moment bin model. For this specific example, the output uncertainty arising from uncertainty in the training and testing data is dominant, followed by uncertainty in the input data, in the trained neural network, and uncertainty in the weights. We discuss the usefulness of the methodology for machine learning practice, and how, through inclusion of uncertainty in the training data, the new methodology is less sensitive to input data that falls outside of the training data set.

[97]  arXiv:2405.20551 [pdf, other]
Title: EM-Assist: Safe Automated ExtractMethod Refactoring with LLMs
Comments: This paper is accepted to the tool demonstration track of the 32nd ACM Symposium on the Foundations of Software Engineering (FSE 2024). This is an author copy
Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Programming Languages (cs.PL)

Excessively long methods, loaded with multiple responsibilities, are challenging to understand, debug, reuse, and maintain. The solution lies in the widely recognized Extract Method refactoring. While the application of this refactoring is supported in modern IDEs, recommending which code fragments to extract has been the topic of many research tools. However, they often struggle to replicate real-world developer practices, resulting in recommendations that do not align with what a human developer would do in real life. To address this issue, we introduce EM-Assist, an IntelliJ IDEA plugin that uses LLMs to generate refactoring suggestions and subsequently validates, enhances, and ranks them. Finally, EM-Assist uses the IntelliJ IDE to apply the user-selected recommendation. In our extensive evaluation of 1,752 real-world refactorings that actually took place in open-source projects, EM-Assist's recall rate was 53.4% among its top-5 recommendations, compared to 39.4% for the previous best-in-class tool that relies solely on static analysis. Moreover, we conducted a usability survey with 18 industrial developers and 94.4% gave a positive rating.

[98]  arXiv:2405.20555 [pdf, other]
Title: Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Subjects: Machine Learning (cs.LG)

In offline reinforcement learning (RL), it is necessary to manage out-of-distribution actions to prevent overestimation of value functions. Policy-regularized methods address this problem by constraining the target policy to stay close to the behavior policy. Although several approaches suggest representing the behavior policy as an expressive diffusion model to boost performance, it remains unclear how to regularize the target policy given a diffusion-modeled behavior sampler. In this paper, we propose Diffusion Actor-Critic (DAC) that formulates the Kullback-Leibler (KL) constraint policy iteration as a diffusion noise regression problem, enabling direct representation of target policies as diffusion models. Our approach follows the actor-critic learning paradigm that we alternatively train a diffusion-modeled target policy and a critic network. The actor training loss includes a soft Q-guidance term from the Q-gradient. The soft Q-guidance grounds on the theoretical solution of the KL constraint policy iteration, which prevents the learned policy from taking out-of-distribution actions. For critic training, we train a Q-ensemble to stabilize the estimation of Q-gradient. Additionally, DAC employs lower confidence bound (LCB) to address the overestimation and underestimation of value targets due to function approximation error. Our approach is evaluated on the D4RL benchmarks and outperforms the state-of-the-art in almost all environments. Code is available at \href{https://github.com/Fang-Lin93/DAC}{\texttt{github.com/Fang-Lin93/DAC}}.

[99]  arXiv:2405.20556 [pdf, other]
Title: Certifying Global Robustness for Deep Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

A globally robust deep neural network resists perturbations on all meaningful inputs. Current robustness certification methods emphasize local robustness, struggling to scale and generalize. This paper presents a systematic and efficient method to evaluate and verify global robustness for deep neural networks, leveraging the PAC verification framework for solid guarantees on verification results. We utilize probabilistic programs to characterize meaningful input regions, setting a realistic standard for global robustness. Additionally, we introduce the cumulative robustness curve as a criterion in evaluating global robustness. We design a statistical method that combines multi-level splitting and regression analysis for the estimation, significantly reducing the execution time. Experimental results demonstrate the efficiency and effectiveness of our verification method and its capability to find rare and diversified counterexamples for adversarial training.

[100]  arXiv:2405.20560 [pdf, other]
Title: Collaborative Resource Management and Workloads Scheduling in Cloud-Assisted Mobile Edge Computing across Timescales
Comments: 11 pages, 10 figures
Journal-ref: IEEE ICWS 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Due to the limited resource capacity of edge servers and the high purchase costs of edge resources, service providers are facing the new challenge of how to take full advantage of the constrained edge resources for Internet of Things (IoT) service hosting and task scheduling to maximize system performance. In this paper, we study the joint optimization problem on service placement, resource provisioning, and workloads scheduling under resource and budget constraints, which is formulated as a mixed integer non-linear programming problem. Given that the frequent service placement and resource provisioning will significantly increase system configuration costs and instability, we propose a two-timescale framework for resource management and workloads scheduling, named RMWS. RMWS consists of a Gibbs sampling algorithm and an alternating minimization algorithm to determine the service placement and resource provisioning on large timescales. And a sub-gradient descent method has been designed to solve the workload scheduling challenge on small timescales.We conduct comprehensive experiments under different parameter settings. The RMWS consistently ensures a minimum 10% performance enhancement compared to other algorithms, showcasing its superiority. Theoretical proofs are also provided accordingly.

[101]  arXiv:2405.20561 [pdf, other]
Title: All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts
Comments: Accepted by USENIX Security 2024
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

In Ethereum, the practice of verifying the validity of the passed addresses is a common practice, which is a crucial step to ensure the secure execution of smart contracts. Vulnerabilities in the process of address verification can lead to great security issues, and anecdotal evidence has been reported by our community. However, this type of vulnerability has not been well studied. To fill the void, in this paper, we aim to characterize and detect this kind of emerging vulnerability. We design and implement AVVERIFIER, a lightweight taint analyzer based on static EVM opcode simulation. Its three-phase detector can progressively rule out false positives and false negatives based on the intrinsic characteristics. Upon a well-established and unbiased benchmark, AVVERIFIER can improve efficiency 2 to 5 times than the SOTA while maintaining a 94.3% precision and 100% recall. After a large-scale evaluation of over 5 million Ethereum smart contracts, we have identified 812 vulnerable smart contracts that were undisclosed by our community before this work, and 348 open source smart contracts were further verified, whose largest total value locked is over $11.2 billion. We further deploy AVVERIFIER as a real-time detector on Ethereum and Binance Smart Chain, and the results suggest that AVVERIFIER can raise timely warnings once contracts are deployed.

[102]  arXiv:2405.20562 [pdf, other]
Title: Can Machine Learning Assist in Diagnosis of Primary Immune Thrombocytopenia? A feasibility study
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Primary Immune thrombocytopenia (ITP) is a rare autoimmune disease characterised by immune-mediated destruction of peripheral blood platelets in patients leading to low platelet counts and bleeding. The diagnosis and effective management of ITP is challenging because there is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcome. In this work we conduct a feasibility study to check if machine learning can be applied effectively for diagnosis of ITP using routine blood tests and demographic data in a non-acute outpatient setting. Various ML models, including Logistic Regression, Support Vector Machine, k-Nearest Neighbor, Decision Tree and Random Forest, were applied to data from the UK Adult ITP Registry and a general hematology clinic. Two different approaches were investigated: a demographic-unaware and a demographic-aware one. We conduct extensive experiments to evaluate the predictive performance of these models and approaches, as well as their bias. The results revealed that Decision Tree and Random Forest models were both superior and fair, achieving nearly perfect predictive and fairness scores, with platelet count identified as the most significant variable. Models not provided with demographic information performed better in terms of predictive accuracy but showed lower fairness score, illustrating a trade-off between predictive performance and fairness.

[103]  arXiv:2405.20565 [pdf, other]
Title: Knowledge Enhanced Multi-intent Transformer Network for Recommendation
Comments: Accept By The Web Conf 2024 (WWW 2024) Industry Track. arXiv admin note: text overlap with arXiv:2204.08807
Subjects: Information Retrieval (cs.IR)

Incorporating Knowledge Graphs into Recommendation has attracted growing attention in industry, due to the great potential of KG in providing abundant supplementary information and interpretability for the underlying models. However, simply integrating KG into recommendation usually brings in negative feedback in industry, due to the ignorance of the following two factors: i) users' multiple intents, which involve diverse nodes in KG. For example, in e-commerce scenarios, users may exhibit preferences for specific styles, brands, or colors. ii) knowledge noise, which is a prevalent issue in Knowledge Enhanced Recommendation (KGR) and even more severe in industry scenarios. The irrelevant knowledge properties of items may result in inferior model performance compared to approaches that do not incorporate knowledge. To tackle these challenges, we propose a novel approach named Knowledge Enhanced Multi-intent Transformer Network for Recommendation (KGTN), comprising two primary modules: Global Intents Modeling with Graph Transformer, and Knowledge Contrastive Denoising under Intents. Specifically, Global Intents with Graph Transformer focuses on capturing learnable user intents, by incorporating global signals from user-item-relation-entity interactions with a graph transformer, meanwhile learning intent-aware user/item representations. Knowledge Contrastive Denoising under Intents is dedicated to learning precise and robust representations. It leverages intent-aware representations to sample relevant knowledge, and proposes a local-global contrastive mechanism to enhance noise-irrelevant representation learning. Extensive experiments conducted on benchmark datasets show the superior performance of our proposed method over the state-of-the-arts. And online A/B testing results on Alibaba large-scale industrial recommendation platform also indicate the real-scenario effectiveness of KGTN.

[104]  arXiv:2405.20567 [pdf, other]
Title: Fast Decentralized State Estimation for Legged Robot Locomotion via EKF and MHE
Subjects: Robotics (cs.RO)

In this paper, we present a fast and decentralized state estimation framework for the control of legged locomotion. The nonlinear estimation of the floating base states is decentralized to an orientation estimation via Extended Kalman Filter (EKF) and a linear velocity estimation via Moving Horizon Estimation (MHE). The EKF fuses the inertia sensor with vision to estimate the floating base orientation. The MHE uses the estimated orientation with all the sensors within a time window in the past to estimate the linear velocities based on a time-varying linear dynamics formulation of the interested states with state constraints. More importantly, a marginalization method based on the optimization structure of the full information filter (FIF) is proposed to convert the equality-constrained FIF to an equivalent MHE. This decoupling of state estimation promotes the desired balance of computation efficiency, accuracy of estimation, and the inclusion of state constraints. The proposed method is shown to be capable of providing accurate state estimation to several legged robots, including the highly dynamic hopping robot PogoX, the bipedal robot Cassie, and the quadrupedal robot Unitree Go1, with a frequency at 200 Hz and a window interval of 0.1s.

[105]  arXiv:2405.20568 [pdf, other]
Title: Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

As a form of artificial intelligence (AI) technology based on interactive learning, deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. However, DRL faces certain limitations, including low sample efficiency and poor generalization. Therefore, we present how to leverage generative AI (GAI) to address these issues above and enhance the performance of DRL algorithms in this paper. We first introduce several classic GAI and DRL algorithms and demonstrate the applications of GAI-enhanced DRL algorithms. Then, we discuss how to use GAI to improve DRL algorithms from the data and policy perspectives. Subsequently, we introduce a framework that demonstrates an actual and novel integration of GAI with DRL, i.e., GAI-enhanced DRL. Additionally, we provide a case study of the framework on UAV-assisted integrated near-field/far-field communication to validate the performance of the proposed framework. Moreover, we present several future directions. Finally, the related code is available at: https://xiewenwen22.github.io/GAI-enhanced-DRL.

[106]  arXiv:2405.20573 [pdf, other]
Title: Enhancing Generative Molecular Design via Uncertainty-guided Fine-tuning of Variational Autoencoders
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

In recent years, deep generative models have been successfully adopted for various molecular design tasks, particularly in the life and material sciences. A critical challenge for pre-trained generative molecular design (GMD) models is to fine-tune them to be better suited for downstream design tasks aimed at optimizing specific molecular properties. However, redesigning and training an existing effective generative model from scratch for each new design task is impractical. Furthermore, the black-box nature of typical downstream tasks$\unicode{x2013}$such as property prediction$\unicode{x2013}$makes it nontrivial to optimize the generative model in a task-specific manner. In this work, we propose a novel approach for a model uncertainty-guided fine-tuning of a pre-trained variational autoencoder (VAE)-based GMD model through performance feedback in an active learning setting. The main idea is to quantify model uncertainty in the generative model, which is made efficient by working within a low-dimensional active subspace of the high-dimensional VAE parameters explaining most of the variability in the model's output. The inclusion of model uncertainty expands the space of viable molecules through decoder diversity. We then explore the resulting model uncertainty class via black-box optimization made tractable by low-dimensionality of the active subspace. This enables us to identify and leverage a diverse set of high-performing models to generate enhanced molecules. Empirical results across six target molecular properties, using multiple VAE-based generative models, demonstrate that our uncertainty-guided fine-tuning approach consistently outperforms the original pre-trained models.

[107]  arXiv:2405.20574 [pdf, other]
Title: Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
Comments: Accepted at ACL 2024 Main
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test sets along with a correlation study within the Ko-H5 benchmark and temporal analyses of the Ko-H5 score. Moreover, we present empirical support for the need to expand beyond set benchmarks. We hope the Open Ko-LLM Leaderboard sets precedent for expanding LLM evaluation to foster more linguistic diversity.

[108]  arXiv:2405.20576 [pdf, other]
Title: Federated Graph Analytics with Differential Privacy
Comments: 13 pages
Subjects: Cryptography and Security (cs.CR)

Collaborative graph analysis across multiple institutions is becoming increasingly popular. Realistic examples include social network analysis across various social platforms, financial transaction analysis across multiple banks, and analyzing the transmission of infectious diseases across multiple hospitals. We define the federated graph analytics, a new problem for collaborative graph analytics under differential privacy. Although differentially private graph analysis has been widely studied, it fails to achieve a good tradeoff between utility and privacy in federated scenarios, due to the limited view of local clients and overlapping information across multiple subgraphs. Motivated by this, we first propose a federated graph analytic framework, named FEAT, which enables arbitrary downstream common graph statistics while preserving individual privacy. Furthermore, we introduce an optimized framework based on our proposed degree-based partition algorithm, called FEAT+, which improves the overall utility by leveraging the true local subgraphs. Finally, extensive experiments demonstrate that our FEAT and FEAT+ significantly outperform the baseline approach by approximately one and four orders of magnitude, respectively.

[109]  arXiv:2405.20579 [pdf, other]
Title: HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios
Comments: 10 pages, 6 tables, 5 figures, 1 page appendix
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Path planning plays a pivotal role in automated parking, yet current methods struggle to efficiently handle the intricate and diverse parking scenarios. One potential solution is the reinforcement learning-based method, leveraging its exploration in unrecorded situations. However, a key challenge lies in training reinforcement learning methods is the inherent randomness in converging to a feasible policy. This paper introduces a novel solution, the Hybrid POlicy Path plannEr (HOPE), which integrates a reinforcement learning agent with Reeds-Shepp curves, enabling effective planning across diverse scenarios. The paper presents a method to calculate and implement an action mask mechanism in path planning, significantly boosting the efficiency and effectiveness of reinforcement learning training. A transformer is employed as the network structure to fuse environmental information and generate planned paths. To facilitate the training and evaluation of the proposed planner, we propose a criterion for categorizing the difficulty level of parking scenarios based on space and obstacle distribution. Experimental results demonstrate that our approach outperforms typical rule-based algorithms and traditional reinforcement learning methods, showcasing higher planning success rates and generalization across various scenarios. The code for our solution will be openly available on \href{GitHub}{https://github.com/jiamiya/HOPE}. % after the paper's acceptance.

[110]  arXiv:2405.20580 [pdf, other]
Title: Topology-Aware Blending Method for Implicit Heterogeneous Porous Model Design
Subjects: Graphics (cs.GR)

Porous structures are materials consisting of minuscule pores, where the microstructure morphology significantly impacts their macroscopic properties.
Integrating different porous structures through a blending method is indispensable to cater to diverse functional regions in heterogeneous models.
Previous studies on blending methods for porous structures have mainly focused on controlling the shape of blending regions, yet they have fallen short in effectively addressing topological errors in blended structures.
This paper introduces a new blending method that successfully addresses this issue.
Initially, a novel initialization method is proposed, which includes distinct strategies for blending regions of varying complexities.
Subsequently, we formulate the challenge of eliminating topological errors as an optimization problem based on persistent homology.
Through iterative updates of control coefficients, this optimization problem is solved to generate a blended porous structure.
Our approach not only avoids topological errors but also governs the shape and positioning of the blending region while remaining unchanged in the structure outside blending region.
The experimental outcomes validate the effectiveness of our method in producing high-quality blended porous structures.
Furthermore, these results highlight potential applications of our blending method in biomimetics and the design of high-stiffness mechanical heterogeneous models.

[111]  arXiv:2405.20582 [pdf, ps, other]
Title: The Point of View of a Sentiment: Towards Clinician Bias Detection in Psychiatric Notes
Comments: Oral presentation at NAACL 2024 Queer in AI Workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In psychiatry, negative patient descriptions and stigmatizing language can contribute to healthcare disparities in two ways: (1) read by patients they can harm their trust and engagement with the medical center; (2) read by future providers they may negatively influence the future perspective of a patient. By leveraging large language models, this work aims to identify the sentiment expressed in psychiatric clinical notes based on the reader's point of view. Extracting sentences from the Mount Sinai Health System's large and diverse clinical notes, we used prompts and in-context learning to adapt three large language models (GPT-3.5, Llama 2, Mistral) to classify the sentiment conveyed by the sentences according to the provider or non-provider point of view. Results showed that GPT-3.5 aligns best to provider point of view, whereas Mistral aligns best to non-provider point of view.

[112]  arXiv:2405.20583 [pdf, other]
Title: The Gestalt Computational Model
Subjects: Computational Geometry (cs.CG); Algebraic Topology (math.AT)

Widely employed in cognitive psychology, Gestalt theory elucidates basic principles in visual perception, but meanwhile presents significant challenges for computation. The advancement of artificial intelligence requires the emulation of human cognitive behavior, for which Gestalt theory serves as a fundamental framework describing human visual cognitive behavior. In this paper, we utilize persistent homology, a mathematical tool in computational topology, to develop a computational model for Gestalt theory, addressing the challenges of quantification and computation. The Gestalt computational model not only holds promise for applications in artificial intelligence and computer vision, but also opens a new research direction of computational visual perception.

[113]  arXiv:2405.20584 [pdf, other]
Title: Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against customization. The adversarial examples are trained to distort the customization model's outputs and thus block the misuse. In this paper, we propose DisDiff (Disrupting Diffusion), a novel adversarial attack method to disrupt the diffusion model outputs. We first delve into the intrinsic image-text relationships, well-known as cross-attention, and empirically find that the subject-identifier token plays an important role in guiding image generation. Thus, we propose the Cross-Attention Erasure module to explicitly "erase" the indicated attention maps and disrupt the text guidance. Besides,we analyze the influence of the sampling process of the diffusion model on Projected Gradient Descent (PGD) attack and introduce a novel Merit Sampling Scheduler to adaptively modulate the perturbation updating amplitude in a step-aware manner. Our DisDiff outperforms the state-of-the-art methods by 12.75% of FDFR scores and 7.25% of ISM scores across two facial benchmarks and two commonly used prompts on average.

[114]  arXiv:2405.20585 [pdf, other]
Title: GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In the rapidly evolving field of healthcare and beyond, the integration of generative AI in Electronic Health Records (EHRs) represents a pivotal advancement, addressing a critical gap in current information extraction techniques. This paper introduces GAMedX, a Named Entity Recognition (NER) approach utilizing Large Language Models (LLMs) to efficiently extract entities from medical narratives and unstructured text generated throughout various phases of the patient hospital visit. By addressing the significant challenge of processing unstructured medical text, GAMedX leverages the capabilities of generative AI and LLMs for improved data extraction. Employing a unified approach, the methodology integrates open-source LLMs for NER, utilizing chained prompts and Pydantic schemas for structured output to navigate the complexities of specialized medical jargon. The findings reveal significant ROUGE F1 score on one of the evaluation datasets with an accuracy of 98\%. This innovation enhances entity extraction, offering a scalable, cost-effective solution for automated forms filling from unstructured data. As a result, GAMedX streamlines the processing of unstructured narratives, and sets a new standard in NER applications, contributing significantly to theoretical and practical advancements beyond the medical technology sphere.

[115]  arXiv:2405.20587 [pdf, ps, other]
Title: Quality-Aware Task Offloading for Cooperative Perception in Vehicular Edge Computing
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

Task offloading in Vehicular Edge Computing (VEC) can advance cooperative perception (CP) to improve traffic awareness in Autonomous Vehicles. In this paper, we propose the Quality-aware Cooperative Perception Task Offloading (QCPTO) scheme. Q-CPTO is the first task offloading scheme that enhances traffic awareness by prioritizing the quality rather than the quantity of cooperative perception. Q-CPTO improves the quality of CP by curtailing perception redundancy and increasing the Value of Information (VOI) procured by each user. We use Kalman filters (KFs) for VOI assessment, predicting the next movement of each vehicle to estimate its region of interest. The estimated VOI is then integrated into the task offloading problem. We formulate the task offloading problem as an Integer Linear Program (ILP) that maximizes the VOI of users and reduces perception redundancy by leveraging the spatially diverse fields of view (FOVs) of vehicles, while adhering to strict latency requirements. We also propose the Q-CPTO-Heuristic (Q-CPTOH) scheme to solve the task offloading problem in a time-efficient manner. Extensive evaluations show that Q-CPTO significantly outperforms prominent task offloading schemes by up to 14% and 20% in terms of response delay and traffic awareness, respectively. Furthermore, Q-CPTO-H closely approaches the optimal solution, with marginal gaps of up to 1.4% and 2.1% in terms of traffic awareness and the number of collaborating users, respectively, while reducing the runtime by up to 84%.

[116]  arXiv:2405.20588 [pdf, other]
Title: DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models
Comments: ACL2024 findings
Subjects: Computation and Language (cs.CL)

Recently, while large language models (LLMs) have demonstrated impressive results, they still suffer from hallucination, i.e., the generation of false information. Model editing is the task of fixing factual mistakes in LLMs; yet, most previous works treat it as a one-time task, paying little attention to ever-emerging mistakes generated by LLMs. We address the task of sequential model editing (SME) that aims to rectify mistakes continuously. A Dynamic Auxiliary Fusion Network (DAFNet) is designed to enhance the semantic interaction among the factual knowledge within the entire sequence, preventing catastrophic forgetting during the editing process of multiple knowledge triples. Specifically, (1) for semantic fusion within a relation triple, we aggregate the intra-editing attention flow into auto-regressive self-attention with token-level granularity in LLMs. We further leverage multi-layer diagonal inter-editing attention flow to update the weighted representations of the entire sequence-level granularity. (2) Considering that auxiliary parameters are required to store the knowledge for sequential editing, we construct a new dataset named \textbf{DAFSet}, fulfilling recent, popular, long-tail and robust properties to enhance the generality of sequential editing. Experiments show DAFNet significantly outperforms strong baselines in single-turn and sequential editing. The usage of DAFSet also consistently improves the performance of other auxiliary network-based methods in various scenarios

[117]  arXiv:2405.20589 [pdf, other]
Title: Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity clients. The heterogeneity in both data and device capacity poses two key challenges for model personalization: 1) accurately retaining necessary knowledge embedded within reduced submodels for each client, and 2) effectively sharing knowledge through aggregating size-varying parameters. To this end, we introduce Pa3dFL, a novel framework designed to enhance local model performance by decoupling and selectively sharing knowledge among capacity-heterogeneous models. First, we decompose each layer of the model into general and personal parameters. Then, we maintain uniform sizes for the general parameters across clients and aggregate them through direct averaging. Subsequently, we employ a hyper-network to generate size-varying personal parameters for clients using learnable embeddings. Finally, we facilitate the implicit aggregation of personal parameters by aggregating client embeddings through a self-attention module. We conducted extensive experiments on three datasets to evaluate the effectiveness of Pa3dFL. Our findings indicate that Pa3dFL consistently outperforms baseline methods across various heterogeneity settings. Moreover, Pa3dFL demonstrates competitive communication and computation efficiency compared to baseline approaches, highlighting its practicality and adaptability in adverse system conditions.

[118]  arXiv:2405.20590 [pdf, other]
Title: Class-Based Time Series Data Augmentation to Mitigate Extreme Class Imbalance for Solar Flare Prediction
Subjects: Machine Learning (cs.LG); Instrumentation and Methods for Astrophysics (astro-ph.IM); Solar and Stellar Astrophysics (astro-ph.SR); Artificial Intelligence (cs.AI)

Time series data plays a crucial role across various domains, making it valuable for decision-making and predictive modeling. Machine learning (ML) and deep learning (DL) have shown promise in this regard, yet their performance hinges on data quality and quantity, often constrained by data scarcity and class imbalance, particularly for rare events like solar flares. Data augmentation techniques offer a potential solution to address these challenges, yet their effectiveness on multivariate time series datasets remains underexplored. In this study, we propose a novel data augmentation method for time series data named Mean Gaussian Noise (MGN). We investigate the performance of MGN compared to eight existing basic data augmentation methods on a multivariate time series dataset for solar flare prediction, SWAN-SF, using a ML algorithm for time series data, TimeSeriesSVC. The results demonstrate the efficacy of MGN and highlight its potential for improving classification performance in scenarios with extremely imbalanced data. Our time complexity analysis shows that MGN also has a competitive computational cost compared to the investigated alternative methods.

[119]  arXiv:2405.20592 [pdf, other]
Title: LInK: Learning Joint Representations of Design and Performance Spaces through Contrastive Learning for Mechanism Synthesis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In this paper, we introduce LInK, a novel framework that integrates contrastive learning of performance and design space with optimization techniques for solving complex inverse problems in engineering design with discrete and continuous variables. We focus on the path synthesis problem for planar linkage mechanisms. By leveraging a multi-modal and transformation-invariant contrastive learning framework, LInK learns a joint representation that captures complex physics and design representations of mechanisms, enabling rapid retrieval from a vast dataset of over 10 million mechanisms. This approach improves precision through the warm start of a hierarchical unconstrained nonlinear optimization algorithm, combining the robustness of traditional optimization with the speed and adaptability of modern deep learning methods. Our results on an existing benchmark demonstrate that LInK outperforms existing methods with 28 times less error compared to a state-of-the-art approach while taking 20 times less time on an existing benchmark. Moreover, we introduce a significantly more challenging benchmark, named LINK-ABC, which involves synthesizing linkages that trace the trajectories of English capital alphabets - an inverse design benchmark task that existing methods struggle with due to large non-linearities and tiny feasible space. Our results demonstrate that LInK not only advances the field of mechanism design but also broadens the applicability of contrastive learning and optimization to other areas of engineering.

[120]  arXiv:2405.20593 [pdf, other]
Title: Excitable crawling
Comments: 5 pages, MTNS 2024 extended abstract
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

We propose and analyze the suitability of a spiking controller to engineer the locomotion of a soft robotic crawler. Inspired by the FitzHugh-Nagumo model of neural excitability, we design a bistable controller with an electrical flipflop circuit representation capable of generating spikes on-demand when coupled to the passive crawler mechanics. A proprioceptive sensory signal from the crawler mechanics turns bistability of the controller into a rhythmic spiking. The output voltage, in turn, activates the crawler's actuators to generate movement through peristaltic waves. We show through geometric analysis that this control strategy achieves endogenous crawling. The electro-mechanical sensorimotor interconnection provides embodied negative feedback regulation, facilitating locomotion. Dimensional analysis provides insights on the characteristic scales in the crawler's mechanical and electrical dynamics, and how they determine the crawling gait. Adaptive control of the electrical scales to optimally match the mechanical scales can be envisioned to achieve further efficiency, as in homeostatic regulation of neuronal circuits. Our approach can scale up to multiple sensorimotor loops inspired by biological central pattern generators.

[121]  arXiv:2405.20594 [pdf, other]
Title: Deep Learning without Weight Symmetry
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Backpropagation (BP), a foundational algorithm for training artificial neural networks, predominates in contemporary deep learning. Although highly successful, it is often considered biologically implausible. A significant limitation arises from the need for precise symmetry between connections in the backward and forward pathways to backpropagate gradient signals accurately, which is not observed in biological brains. Researchers have proposed several algorithms to alleviate this symmetry constraint, such as feedback alignment and direct feedback alignment. However, their divergence from backpropagation dynamics presents challenges, particularly in deeper networks and convolutional layers. Here we introduce the Product Feedback Alignment (PFA) algorithm. Our findings demonstrate that PFA closely approximates BP and achieves comparable performance in deep convolutional networks while avoiding explicit weight symmetry. Our results offer a novel solution to the longstanding weight symmetry problem, leading to more biologically plausible learning in deep convolutional networks compared to earlier methods.

[122]  arXiv:2405.20596 [pdf, other]
Title: Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation
Comments: 10 pages; Accepted by NeurIPS 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Traditional semi-supervised learning (SSL) assumes that the feature distributions of labeled and unlabeled data are consistent which rarely holds in realistic scenarios. In this paper, we propose a novel SSL setting, where unlabeled samples are drawn from a mixed distribution that deviates from the feature distribution of labeled samples. Under this setting, previous SSL methods tend to predict wrong pseudo-labels with the model fitted on labeled data, resulting in noise accumulation. To tackle this issue, we propose Self-Supervised Feature Adaptation (SSFA), a generic framework for improving SSL performance when labeled and unlabeled data come from different distributions. SSFA decouples the prediction of pseudo-labels from the current model to improve the quality of pseudo-labels. Particularly, SSFA incorporates a self-supervised task into the SSL framework and uses it to adapt the feature extractor of the model to the unlabeled data. In this way, the extracted features better fit the distribution of unlabeled data, thereby generating high-quality pseudo-labels. Extensive experiments show that our proposed SSFA is applicable to various pseudo-label-based SSL learners and significantly improves performance in labeled, unlabeled, and even unseen distributions.

[123]  arXiv:2405.20599 [pdf, ps, other]
Title: Exact Algorithms for MaxCut on Split Graphs
Authors: Marko Lalovic
Comments: 8 pages, 4 figures
Subjects: Data Structures and Algorithms (cs.DS)

This paper presents an $O^{*}(1.42^{n})$ time algorithm for the Maximum Cut problem on split graphs, along with a subexponential time algorithm for its decision variant.

[124]  arXiv:2405.20600 [pdf, other]
Title: Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning
Subjects: Artificial Intelligence (cs.AI)

Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious catastrophic forgetting caused by partial label problem and inadequate label semantics mining. In this paper, we propose an augmented emotional semantics learning framework for multi-label class incremental emotion decoding. Specifically, we design an augmented emotional relation graph module with label disambiguation to handle the past-missing partial label problem. Then, we leverage domain knowledge from affective dimension space to alleviate future-missing partial label problem by knowledge distillation. Besides, an emotional semantics learning module is constructed with a graph autoencoder to obtain emotion embeddings in order to guide the semantic-specific feature decoupling for better multi-label learning. Extensive experiments on three datasets show the superiority of our method for improving emotion decoding performance and mitigating forgetting on MLCIL problem.

[125]  arXiv:2405.20602 [pdf, other]
Title: Masked Language Modeling Becomes Conditional Density Estimation for Tabular Data Synthesis
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCoDE, by redefining the multi-class classification task of Masked Language Modeling (MLM) as histogram-based non-parametric conditional density estimation. Our proposed method enables estimating conditional densities across arbitrary combinations of target and conditional variables. Furthermore, we demonstrate that our proposed method bridges the theoretical gap between distributional learning and MLM. To validate the effectiveness of our proposed model, we conduct synthetic data generation experiments on 10 real-world datasets. Given the analogy between predicting masked input tokens in MLM and missing data imputation, we also evaluate the performance of multiple imputations on incomplete datasets with various missing data mechanisms. Moreover, our proposed model offers the advantage of enabling adjustments to data privacy levels without requiring re-training.

[126]  arXiv:2405.20603 [pdf, ps, other]
Title: Advancing Financial Risk Prediction Through Optimized LSTM Model Performance and Comparative Analysis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper focuses on the application and optimization of LSTM model in financial risk prediction. The study starts with an overview of the architecture and algorithm foundation of LSTM, and then details the model training process and hyperparameter tuning strategy, and adjusts network parameters through experiments to improve performance. Comparative experiments show that the optimized LSTM model shows significant advantages in AUC index compared with random forest, BP neural network and XGBoost, which verifies its efficiency and practicability in the field of financial risk prediction, especially its ability to deal with complex time series data, which lays a solid foundation for the application of the model in the actual production environment.

[127]  arXiv:2405.20605 [pdf, other]
Title: Searching for internal symbols underlying deep learning
Comments: 10 pages, 7 figures, 3 tables and Appendix
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Deep learning (DL) enables deep neural networks (DNNs) to automatically learn complex tasks or rules from given examples without instructions or guiding principles. As we do not engineer DNNs' functions, it is extremely difficult to diagnose their decisions, and multiple lines of studies proposed to explain principles of DNNs/DL operations. Notably, one line of studies suggests that DNNs may learn concepts, the high level features recognizable to humans. Thus, we hypothesized that DNNs develop abstract codes, not necessarily recognizable to humans, which can be used to augment DNNs' decision-making. To address this hypothesis, we combined foundation segmentation models and unsupervised learning to extract internal codes and identify potential use of abstract codes to make DL's decision-making more reliable and safer.

[128]  arXiv:2405.20606 [pdf, other]
Title: Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

Supervised and self-supervised learning are two main training paradigms for skeleton-based human action recognition. However, the former one-hot classification requires labor-intensive predefined action categories annotations, while the latter involves skeleton transformations (e.g., cropping) in the pretext tasks that may impair the skeleton structure. To address these challenges, we introduce a novel skeleton-based training framework (C$^2$VL) based on Cross-modal Contrastive learning that uses the progressive distillation to learn task-agnostic human skeleton action representation from the Vision-Language knowledge prompts. Specifically, we establish the vision-language action concept space through vision-language knowledge prompts generated by pre-trained large multimodal models (LMMs), which enrich the fine-grained details that the skeleton action space lacks. Moreover, we propose the intra-modal self-similarity and inter-modal cross-consistency softened targets in the cross-modal contrastive process to progressively control and guide the degree of pulling vision-language knowledge prompts and corresponding skeletons closer. These soft instance discrimination and self-knowledge distillation strategies contribute to the learning of better skeleton-based action representations from the noisy skeleton-vision-language pairs. During the inference phase, our method requires only the skeleton data as the input for action recognition and no longer for vision-language prompts. Extensive experiments show that our method achieves state-of-the-art results on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. The code will be available in the future.

[129]  arXiv:2405.20607 [pdf, other]
Title: Textual Inversion and Self-supervised Refinement for Radiology Report Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specifically, textual inversion can project text and image into the same space by representing images as pseudo words to eliminate the cross-modeling gap. Subsequently, self-supervised refinement refines these pseudo words through contrastive loss computation between images and texts, enhancing the fidelity of generated reports to images. Notably, TISR is orthogonal to most existing methods, plug-and-play. We conduct experiments on two widely-used public datasets and achieve significant improvements on various baselines, which demonstrates the effectiveness and generalization of TISR. The code will be available soon.

[130]  arXiv:2405.20608 [pdf, other]
Title: Identifying while Learning for Document Event Causality Identification
Comments: Accepted at ACL 2024
Subjects: Computation and Language (cs.CL)

Event Causality Identification (ECI) aims to detect whether there exists a causal relation between two events in a document. Existing studies adopt a kind of identifying after learning paradigm, where events' representations are first learned and then used for the identification. Furthermore, they mainly focus on the causality existence, but ignoring causal direction. In this paper, we take care of the causal direction and propose a new identifying while learning mode for the ECI task. We argue that a few causal relations can be easily identified with high confidence, and the directionality and structure of these identified causalities can be utilized to update events' representations for boosting next round of causality identification. To this end, this paper designs an *iterative learning and identifying framework*: In each iteration, we construct an event causality graph, on which events' causal structure representations are updated for boosting causal identification. Experiments on two public datasets show that our approach outperforms the state-of-the-art algorithms in both evaluations for causality existence identification and direction identification.

[131]  arXiv:2405.20609 [pdf, ps, other]
Title: Psychological Antecedents to Emergence of Team Autonomy in Agile Scrum Teams
Comments: 17 pages, 5 figures
Subjects: Software Engineering (cs.SE)

The purpose of this research study was to study the influence of key psychological factors on emergence of Agile team autonomy that leads to Agile project success in software organizations.

[132]  arXiv:2405.20610 [pdf, other]
Title: Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation
Comments: 14 pages, 5 figures, submitted to IEEE TPAMI. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability and compatibility of these methods. In this paper, we propose a PrevMatch framework that effectively mitigates the aforementioned limitations by maximizing the utilization of the temporal knowledge obtained during the training process. The PrevMatch framework relies on two core strategies: (1) we reconsider the use of temporal knowledge and thus directly utilize previous models obtained during training to generate additional pseudo-label guidance, referred to as previous guidance. (2) we design a highly randomized ensemble strategy to maximize the effectiveness of the previous guidance. Experimental results on four benchmark semantic segmentation datasets confirm that the proposed method consistently outperforms existing methods across various evaluation protocols. In particular, with DeepLabV3+ and ResNet-101 network settings, PrevMatch outperforms the existing state-of-the-art method, Diverse Co-training, by +1.6 mIoU on Pascal VOC with only 92 annotated images, while achieving 2.4 times faster training. Furthermore, the results indicate that PrevMatch induces stable optimization, particularly in benefiting classes that exhibit poor performance. Code is available at https://github.com/wooseok-shin/PrevMatch

[133]  arXiv:2405.20611 [pdf, ps, other]
Title: Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code
Comments: 8 pages, 0 figures, IEEE 4th Cyber Awareness and Research Symposium 2024 (CARS'24)
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)

Detecting vulnerabilities within compiled binaries is challenging due to lost high-level code structures and other factors such as architectural dependencies, compilers, and optimization options. To address these obstacles, this research explores vulnerability detection by using natural language processing (NLP) embedding techniques with word2vec, BERT, and RoBERTa to learn semantics from intermediate representation (LLVM) code. Long short-term memory (LSTM) neural networks were trained on embeddings from encoders created using approximately 118k LLVM functions from the Juliet dataset. This study is pioneering in its comparison of word2vec models with multiple bidirectional transformer (BERT, RoBERTa) embeddings built using LLVM code to train neural networks to detect vulnerabilities in compiled binaries. word2vec Continuous Bag of Words (CBOW) models achieved 92.3% validation accuracy in detecting vulnerabilities, outperforming word2vec Skip-Gram, BERT, and RoBERTa. This suggests that complex contextual NLP embeddings may not provide advantages over simpler word2vec models for this task when a limited number (e.g. 118K) of data samples are used to train the bidirectional transformer-based models. The comparative results provide novel insights into selecting optimal embeddings for learning compiler-independent semantic code representations to advance machine learning detection of vulnerabilities in compiled binaries.

[134]  arXiv:2405.20612 [pdf, other]
Title: UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) have demonstrated impressive capabilities in various tasks using the in-context learning (ICL) paradigm. However, their effectiveness is often compromised by inherent bias, leading to prompt brittleness, i.e., sensitivity to design settings such as example selection, order, and prompt formatting. Previous studies have addressed LLM bias through external adjustment of model outputs, but the internal mechanisms that lead to such bias remain unexplored. Our work delves into these mechanisms, particularly investigating how feedforward neural networks (FFNs) and attention heads result in the bias of LLMs. By Interpreting the contribution of individual FFN vectors and attention heads, we identify the biased LLM components that skew LLMs' prediction toward specific labels. To mitigate these biases, we introduce UniBias, an inference-only method that effectively identifies and eliminates biased FFN vectors and attention heads. Extensive experiments across 12 NLP datasets demonstrate that UniBias significantly enhances ICL performance and alleviates prompt brittleness of LLMs.

[135]  arXiv:2405.20613 [pdf, other]
Title: FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores
Subjects: Computation and Language (cs.CL)

The current gold standard for evaluating generated chest x-ray (CXR) reports is through radiologist annotations. However, this process can be extremely time-consuming and costly, especially when evaluating large numbers of reports. In this work, we present FineRadScore, a Large Language Model (LLM)-based automated evaluation metric for generated CXR reports. Given a candidate report and a ground-truth report, FineRadScore gives the minimum number of line-by-line corrections required to go from the candidate to the ground-truth report. Additionally, FineRadScore provides an error severity rating with each correction and generates comments explaining why the correction was needed. We demonstrate that FineRadScore's corrections and error severity scores align with radiologist opinions. We also show that, when used to judge the quality of the report as a whole, FineRadScore aligns with radiologists as well as current state-of-the-art automated CXR evaluation metrics. Finally, we analyze FineRadScore's shortcomings to provide suggestions for future improvements.

[136]  arXiv:2405.20614 [pdf, other]
Title: EPIDetect: Video-based convulsive seizure detection in chronic epilepsy mouse model for anti-epilepsy drug screening
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the preclinical translational studies, drug candidates with remarkable anti-epileptic efficacy demonstrate long-term suppression of spontaneous recurrent seizures (SRSs), particularly convulsive seizures (CSs), in mouse models of chronic epilepsy. However, the current methods for monitoring CSs have limitations in terms of invasiveness, specific laboratory settings, high cost, and complex operation, which hinder drug screening efforts. In this study, a camera-based system for automated detection of CSs in chronically epileptic mice is first established to screen potential anti-epilepsy drugs.

[137]  arXiv:2405.20618 [pdf, other]
Title: CPAFT: A Consistent Parallel Advancing Front Technique for Unstructured Triangular/Tetrahedral Mesh Generation
Subjects: Numerical Analysis (math.NA); Computational Geometry (cs.CG)

Compared with the remarkable progress made in parallel numerical solvers of partial differential equations,the development of algorithms for generating unstructured triangular/tetrahedral meshes has been relatively sluggish. In this paper, we propose a novel, consistent parallel advancing front technique (CPAFT) by combining the advancing front technique, the domain decomposition method based on space-filling curves, the distributed forest-of-overlapping-trees approach, and the consistent parallel maximal independent set algorithm. The newly proposed CPAFT algorithm can mathematically ensure that the generated unstructured triangular/tetrahedral meshes are independent of the number of processors and the implementation of domain decomposition. Several numerical tests are conducted to validate the parallel consistency and outstanding parallel efficiency of the proposed algorithm, which scales effectively up to two thousand processors. This is, as far as we know, the first parallel unstructured triangular/tetrahedral mesh generator with scalability to O(1,000) CPU processors.

[138]  arXiv:2405.20620 [pdf, other]
Title: "Forgetting" in Machine Learning and Beyond: A Survey
Subjects: Machine Learning (cs.LG)

This survey investigates the multifaceted nature of forgetting in machine learning, drawing insights from neuroscientific research that posits forgetting as an adaptive function rather than a defect, enhancing the learning process and preventing overfitting. This survey focuses on the benefits of forgetting and its applications across various machine learning sub-fields that can help improve model performance and enhance data privacy. Moreover, the paper discusses current challenges, future directions, and ethical considerations regarding the integration of forgetting mechanisms into machine learning models.

[139]  arXiv:2405.20622 [pdf, other]
Title: Superfast Selection for Decision Tree Algorithms
Subjects: Machine Learning (cs.LG)

We present a novel and systematic method, called Superfast Selection, for selecting the "optimal split" for decision tree and feature selection algorithms over tabular data. The method speeds up split selection on a single feature by lowering the time complexity, from O(MN) (using the standard selection methods) to O(M), where M represents the number of input examples and N the number of unique values. Additionally, the need for pre-encoding, such as one-hot or integer encoding, for feature value heterogeneity is eliminated. To demonstrate the efficiency of Superfast Selection, we empower the CART algorithm by integrating Superfast Selection into it, creating what we call Ultrafast Decision Tree (UDT). This enhancement enables UDT to complete the training process with a time complexity O(KMlogM) (K is the number of features). Additionally, the Training Only Once Tuning enables UDT to avoid the repetitive training process required to find the optimal hyper-parameter. Experiments show that the UDT can finish a single training on KDD99-10% dataset (494K examples with 41 features) within 1 second and tuning with 214.8 sets of hyper-parameters within 0.25 second on a laptop.

[140]  arXiv:2405.20623 [pdf, other]
Title: Prune at the Clients, Not the Server: Accelerated Sparse Training in Federated Learning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

In the recent paradigm of Federated Learning (FL), multiple clients train a shared model while keeping their local data private. Resource constraints of clients and communication costs pose major problems for training large models in FL. On the one hand, addressing the resource limitations of the clients, sparse training has proven to be a powerful tool in the centralized setting. On the other hand, communication costs in FL can be addressed by local training, where each client takes multiple gradient steps on its local data. Recent work has shown that local training can provably achieve the optimal accelerated communication complexity [Mishchenko et al., 2022]. Hence, one would like an accelerated sparse training algorithm. In this work we show that naive integration of sparse training and acceleration at the server fails, and how to fix it by letting the clients perform these tasks appropriately. We introduce Sparse-ProxSkip, our method developed for the nonconvex setting, inspired by RandProx [Condat and Richt\'arik, 2022], which provably combines sparse training and acceleration in the convex setting. We demonstrate the good performance of Sparse-ProxSkip in extensive experiments.

[141]  arXiv:2405.20624 [pdf, ps, other]
Title: Leveraging Large Language Models for Entity Matching
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Entity matching (EM) is a critical task in data integration, aiming to identify records across different datasets that refer to the same real-world entities. Traditional methods often rely on manually engineered features and rule-based systems, which struggle with diverse and unstructured data. The emergence of Large Language Models (LLMs) such as GPT-4 offers transformative potential for EM, leveraging their advanced semantic understanding and contextual capabilities. This vision paper explores the application of LLMs to EM, discussing their advantages, challenges, and future research directions. Additionally, we review related work on applying weak supervision and unsupervised approaches to EM, highlighting how LLMs can enhance these methods.

[142]  arXiv:2405.20625 [pdf, other]
Title: Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning
Subjects: Artificial Intelligence (cs.AI)

As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics.

[143]  arXiv:2405.20626 [pdf, other]
Title: Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems
Comments: TKDE 2023
Subjects: Information Retrieval (cs.IR); Information Theory (cs.IT)

Recommendation performance usually exhibits a long-tail distribution over users -- a small portion of head users enjoy much more accurate recommendation services than the others. We reveal two sources of this performance heterogeneity problem: the uneven distribution of historical interactions (a natural source); and the biased training of recommender models (a model source). As addressing this problem cannot sacrifice the overall performance, a wise choice is to eliminate the model bias while maintaining the natural heterogeneity. The key to debiased training lies in eliminating the effect of confounders that influence both the user's historical behaviors and the next behavior. The emerging causal recommendation methods achieve this by modeling the causal effect between user behaviors, however potentially neglect unobserved confounders (\eg, friend suggestions) that are hard to measure in practice. To address unobserved confounders, we resort to the front-door adjustment (FDA) in causal theory and propose a causal multi-teacher distillation framework (CausalD). FDA requires proper mediators in order to estimate the causal effects of historical behaviors on the next behavior. To achieve this, we equip CausalD with multiple heterogeneous recommendation models to model the mediator distribution. Then, the causal effect estimated by FDA is the expectation of recommendation prediction over the mediator distribution and the prior distribution of historical behaviors, which is technically achieved by multi-teacher ensemble. To pursue efficient inference, CausalD further distills multiple teachers into one student model to directly infer the causal effect for making recommendations.

[144]  arXiv:2405.20628 [pdf, other]
Title: ToxVidLLM: A Multimodal LLM-based Framework for Toxicity Detection in Code-Mixed Videos
Comments: ACL Findings 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

In an era of rapidly evolving internet technology, the surge in multimodal content, including videos, has expanded the horizons of online communication. However, the detection of toxic content in this diverse landscape, particularly in low-resource code-mixed languages, remains a critical challenge. While substantial research has addressed toxic content detection in textual data, the realm of video content, especially in non-English languages, has been relatively underexplored. This paper addresses this research gap by introducing a benchmark dataset, the first of its kind, consisting of 931 videos with 4021 code-mixed Hindi-English utterances collected from YouTube. Each utterance within this dataset has been meticulously annotated for toxicity, severity, and sentiment labels. We have developed an advanced Multimodal Multitask framework built for Toxicity detection in Video Content by leveraging Large Language Models (LLMs), crafted for the primary objective along with the additional tasks of conducting sentiment and severity analysis. ToxVidLLM incorporates three key modules the Encoder module, Cross-Modal Synchronization module, and Multitask module crafting a generic multimodal LLM customized for intricate video classification tasks. Our experiments reveal that incorporating multiple modalities from the videos substantially enhances the performance of toxic content detection by achieving an Accuracy and Weighted F1 score of 94.29% and 94.35%, respectively.

[145]  arXiv:2405.20630 [pdf, other]
Title: Stochastic Optimal Control for Diffusion Bridges in Function Spaces
Subjects: Machine Learning (cs.LG)

Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algorithms to function spaces. Specifically, we demonstrate how Doob's $h$-transform, the fundamental tool for constructing diffusion bridges, can be derived from the SOC perspective and expanded to infinite dimensions. This expansion presents a challenge, as infinite-dimensional spaces typically lack closed-form densities. Leveraging our theory, we establish that solving the optimal control problem with a specific objective function choice is equivalent to learning diffusion-based generative models. We propose two applications: (1) learning bridges between two infinite-dimensional distributions and (2) generative models for sampling from an infinite-dimensional distribution. Our approach proves effective for diverse problems involving continuous function space representations, such as resolution-free images, time-series data, and probability density functions.

[146]  arXiv:2405.20631 [pdf, ps, other]
Title: Optimizing Contracts in Principal-Agent Team Production
Authors: Shiliang Zuo
Subjects: Computer Science and Game Theory (cs.GT)

I study a principal-agent team production model. The principal hires a team of agents to participate in a common production task. The exact effort of each agent is unobservable and unverifiable, but the total production outcome (e.g. the total revenue) can be observed. The principal incentivizes the agents to exert effort through contracts. Specifically, the principal promises that each agent receives a pre-specified amount of share of the total production output. The principal is interested in finding the optimal profit-sharing rule that maximizes her own utility. I identify a condition under which the principal's optimization problem can be reformulated as solving a family of convex programs, thereby showing the optimal contract can be found efficiently.

[147]  arXiv:2405.20633 [pdf, other]
Title: Action-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action Detection
Comments: Under consideration at Computer Vision and Image Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Human action recognition is a crucial task in computer vision systems. However, in real-world scenarios, human actions often fall outside the distribution of training data, requiring a model to both recognize in-distribution (ID) actions and reject out-of-distribution (OOD) ones. Despite its importance, there has been limited research on OOD detection in human actions. Existing works on OOD detection mainly focus on image data with RGB structure, and many methods are post-hoc in nature. While these methods are convenient and computationally efficient, they often lack sufficient accuracy and fail to consider the presence of OOD samples. To address these challenges, we propose a novel end-to-end skeleton-based model called Action-OOD, specifically designed for OOD human action detection. Unlike some existing approaches that may require prior knowledge of existing OOD data distribution, our model solely utilizes in-distribution (ID) data during the training stage, effectively mitigating the overconfidence issue prevalent in OOD detection. We introduce an attention-based feature fusion block, which enhances the model's capability to recognize unknown classes while preserving classification accuracy for known classes. Further, we present a novel energy-based loss function and successfully integrate it with the traditional cross-entropy loss to maximize the separation of data distributions between ID and OOD. Through extensive experiments conducted on NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics-400 datasets, we demonstrate the superior performance of our proposed approach compared to state-of-the-art methods. Our findings underscore the effectiveness of classic OOD detection techniques in the context of skeleton-based action recognition tasks, offering promising avenues for future research in this field. Code will be available at: https://github.com/YilliaJing/Action-OOD.git.

[148]  arXiv:2405.20640 [pdf, other]
Title: Heterophilous Distribution Propagation for Graph Neural Networks
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Graph Neural Networks (GNNs) have achieved remarkable success in various graph mining tasks by aggregating information from neighborhoods for representation learning. The success relies on the homophily assumption that nearby nodes exhibit similar behaviors, while it may be violated in many real-world graphs. Recently, heterophilous graph neural networks (HeterGNNs) have attracted increasing attention by modifying the neural message passing schema for heterophilous neighborhoods. However, they suffer from insufficient neighborhood partition and heterophily modeling, both of which are critical but challenging to break through. To tackle these challenges, in this paper, we propose heterophilous distribution propagation (HDP) for graph neural networks. Instead of aggregating information from all neighborhoods, HDP adaptively separates the neighbors into homophilous and heterphilous parts based on the pseudo assignments during training. The heterophilous neighborhood distribution is learned with orthogonality-oriented constraint via a trusted prototype contrastive learning paradigm. Both the homophilous and heterophilous patterns are propagated with a novel semantic-aware message passing mechanism. We conduct extensive experiments on 9 benchmark datasets with different levels of homophily. Experimental results show that our method outperforms representative baselines on heterophilous datasets.

[149]  arXiv:2405.20641 [pdf, other]
Title: Query Provenance Analysis for Robust and Efficient Query-based Black-box Attack Defense
Subjects: Cryptography and Security (cs.CR)

Query-based black-box attacks have emerged as a significant threat to machine learning systems, where adversaries can manipulate the input queries to generate adversarial examples that can cause misclassification of the model. To counter these attacks, researchers have proposed Stateful Defense Models (SDMs) for detecting adversarial query sequences and rejecting queries that are "similar" to the history queries. Existing state-of-the-art (SOTA) SDMs (e.g., BlackLight and PIHA) have shown great effectiveness in defending against these attacks. However, recent studies have shown that they are vulnerable to Oracle-guided Adaptive Rejection Sampling (OARS) attacks, which is a stronger adaptive attack strategy. It can be easily integrated with existing attack algorithms to evade the SDMs by generating queries with fine-tuned direction and step size of perturbations utilizing the leaked decision information from the SDMs.
In this paper, we propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient SDMs. QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences. To utilize the query provenance, we propose an efficient query provenance analysis algorithm with dynamic management. We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms. The results show that QPA outperforms the baselines in terms of defense effectiveness and efficiency on both non-adaptive and adaptive attacks. Specifically, QPA reduces the Attack Success Rate (ASR) of OARS to 4.08%, comparing to 77.63% and 87.72% for BlackLight and PIHA, respectively. Moreover, QPA also achieves 7.67x and 2.25x higher throughput than BlackLight and PIHA.

[150]  arXiv:2405.20642 [pdf, other]
Title: Principal-Agent Multitasking: the Uniformity of Optimal Contracts and its Efficient Learning via Instrumental Regression
Authors: Shiliang Zuo
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This work studies the multitasking principal-agent problem. I first show a ``uniformity'' result. Specifically, when the tasks are perfect substitutes, and the agent's cost function is homogeneous to a certain degree, then the optimal contract only depends on the marginal utility of each task and the degree of homogeneity. I then study a setting where the marginal utility of each task is unknown so that the optimal contract must be learned or estimated with observational data. I identify this problem as a regression problem with measurement errors and observe that this problem can be cast as an instrumental regression problem. The current works observe that both the contract and the repeated observations (when available) can act as valid instrumental variables, and propose using the generalized method of moments estimator to compute an approximately optimal contract from offline data. I also study an online setting and show how the optimal contract can be efficiently learned in an online fashion using the two estimators. Here the principal faces an exploration-exploitation tradeoff: she must experiment with new contracts and observe their outcome whilst at the same time ensuring her experimentations are not deviating too much from the optimal contract. This work shows when repeated observations are available and agents are sufficiently ``diverse", the principal can achieve a very low $\widetilde{O}(d)$ cumulative utility loss, even with a ``pure exploitation" algorithm.

[151]  arXiv:2405.20643 [pdf, other]
Title: Learning Gaze-aware Compositional GAN
Comments: Accepted by ETRA 2024 as Full paper, and as journal paper in Proceedings of the ACM on Computer Graphics and Interactive Techniques
Journal-ref: Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection.

[152]  arXiv:2405.20646 [pdf, other]
Title: Large Language Models Enhanced Sequential Recommendation for Long-tail User and Item
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Sequential recommendation systems (SRS) serve the purpose of predicting users' subsequent preferences based on their past interactions and have been applied across various domains such as e-commerce and social networking platforms. However, practical SRS encounters challenges due to the fact that most users engage with only a limited number of items, while the majority of items are seldom consumed. These challenges, termed as the long-tail user and long-tail item dilemmas, often create obstacles for traditional SRS methods. Mitigating these challenges is crucial as they can significantly impact user satisfaction and business profitability. While some research endeavors have alleviated these issues, they still grapple with issues such as seesaw or noise stemming from the scarcity of interactions. The emergence of large language models (LLMs) presents a promising avenue to address these challenges from a semantic standpoint. In this study, we introduce the Large Language Models Enhancement framework for Sequential Recommendation (LLM-ESR), which leverages semantic embeddings from LLMs to enhance SRS performance without increasing computational overhead. To combat the long-tail item challenge, we propose a dual-view modeling approach that fuses semantic information from LLMs with collaborative signals from traditional SRS. To address the long-tail user challenge, we introduce a retrieval augmented self-distillation technique to refine user preference representations by incorporating richer interaction data from similar users. Through comprehensive experiments conducted on three authentic datasets using three widely used SRS models, our proposed enhancement framework demonstrates superior performance compared to existing methodologies.

[153]  arXiv:2405.20648 [pdf, other]
Title: Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Video is an increasingly prominent and information-dense medium, yet it poses substantial challenges for language models. A typical video consists of a sequence of shorter segments, or shots, that collectively form a coherent narrative. Each shot is analogous to a word in a sentence where multiple data streams of information (such as visual and auditory data) must be processed simultaneously. Comprehension of the entire video requires not only understanding the visual-audio information of each shot but also requires that the model links the ideas between each shot to generate a larger, all-encompassing story. Despite significant progress in the field, current works often overlook videos' more granular shot-by-shot semantic information. In this project, we propose a family of efficient large language vision models (LLVMs) to boost video summarization and captioning called Shotluck Holmes. By leveraging better pretraining and data collection strategies, we extend the abilities of existing small LLVMs from being able to understand a picture to being able to understand a sequence of frames. Specifically, we show that Shotluck Holmes achieves better performance than state-of-the-art results on the Shot2Story video captioning and summary task with significantly smaller and more computationally efficient models.

[154]  arXiv:2405.20649 [pdf, other]
Title: Reward-based Input Construction for Cross-document Relation Extraction
Comments: Accepted at ACL 2024 main conference
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Relation extraction (RE) is a fundamental task in natural language processing, aiming to identify relations between target entities in text. While many RE methods are designed for a single sentence or document, cross-document RE has emerged to address relations across multiple long documents. Given the nature of long documents in cross-document RE, extracting document embeddings is challenging due to the length constraints of pre-trained language models. Therefore, we propose REward-based Input Construction (REIC), the first learning-based sentence selector for cross-document RE. REIC extracts sentences based on relational evidence, enabling the RE module to effectively infer relations. Since supervision of evidence sentences is generally unavailable, we train REIC using reinforcement learning with RE prediction scores as rewards. Experimental results demonstrate the superiority of our method over heuristic methods for different RE structures and backbones in cross-document RE. Our code is publicly available at https://github.com/aailabkaist/REIC.

[155]  arXiv:2405.20650 [pdf, other]
Title: GenMix: Combining Generative and Mixture Data Augmentation for Medical Image Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel data augmentation technique called GenMix, which combines generative and mixture approaches to leverage the strengths of both methods. While generative models excel at creating new data patterns, they face challenges such as mode collapse in GANs and difficulties in training diffusion models, especially with limited medical imaging data. On the other hand, mixture models enhance class boundary regions but tend to favor the major class in scenarios with class imbalance. To address these limitations, GenMix integrates both approaches to complement each other. GenMix operates in two stages: (1) training a generative model to produce synthetic images, and (2) performing mixup between synthetic and real data. This process improves the quality and diversity of synthetic data while simultaneously benefiting from the new pattern learning of generative models and the boundary enhancement of mixture models. We validate the effectiveness of our method on the task of classifying focal liver lesions (FLLs) in CT images. Our results demonstrate that GenMix enhances the performance of various generative models, including DCGAN, StyleGAN, Textual Inversion, and Diffusion Models. Notably, the proposed method with Textual Inversion outperforms other methods without fine-tuning diffusion model on the FLL dataset.

[156]  arXiv:2405.20652 [pdf, other]
Title: Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs
Comments: Published as a conference paper at ICML 2024
Subjects: Machine Learning (cs.LG)

Graph Neural Networks (GNNs) have gained significant attention as a powerful modeling and inference method, especially for homophilic graph-structured data. To empower GNNs in heterophilic graphs, where adjacent nodes exhibit dissimilar labels or features, Signed Message Passing (SMP) has been widely adopted. However, there is a lack of theoretical and empirical analysis regarding the limitations of SMP. In this work, we unveil some potential pitfalls of SMP and their remedies. We first identify two limitations of SMP: undesirable representation update for multi-hop neighbors and vulnerability against oversmoothing issues. To overcome these challenges, we propose a novel message passing function called Multiset to Multiset GNN(M2M-GNN). Our theoretical analyses and extensive experiments demonstrate that M2M-GNN effectively alleviates the aforementioned limitations of SMP, yielding superior performance in comparison

[157]  arXiv:2405.20653 [pdf, other]
Title: Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Subjects: Artificial Intelligence (cs.AI)

Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts. In this paper, we introduce BOOST, a simple attack that leverages only the eos tokens. We demonstrate that rather than constructing complicated jailbreaking prompts, the attacker can simply append a few eos tokens to the end of a harmful question. It will bypass the safety alignment of LLMs and lead to successful jailbreaking attacks. We further apply BOOST to four representative jailbreak methods and show that the attack success rates of these methods can be significantly enhanced by simply adding eos tokens to the prompt. To understand this simple but novel phenomenon, we conduct empirical analyses. Our analysis reveals that adding eos tokens makes the target LLM believe the input is much less harmful, and eos tokens have low attention values and do not affect LLM's understanding of the harmful questions, leading the model to actually respond to the questions. Our findings uncover how fragile an LLM is against jailbreak attacks, motivating the development of strong safety alignment approaches.

[158]  arXiv:2405.20654 [pdf, other]
Title: Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models
Comments: Accepted at Gen-IR@SIGIR24
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-written prompt (or hard prompt), and fine-tuning LLMs can be computationally intensive and time-consuming. Furthermore, this approach limits the leverage of question-passage relevance pairs and passage-specific knowledge to enhance the ranking capabilities of LLMs. In this paper, we propose passage-specific prompt tuning for reranking in open-domain question answering (PSPT): a parameter-efficient method that fine-tunes learnable passage-specific soft prompts, incorporating passage-specific knowledge from a limited set of question-passage relevance pairs. The method involves ranking retrieved passages based on the log-likelihood of the model generating the question conditioned on each passage and the learned soft prompt. We conducted extensive experiments utilizing the Llama-2-chat-7B model across three publicly available open-domain question answering datasets and the results demonstrate the effectiveness of the proposed approach.

[159]  arXiv:2405.20656 [pdf, other]
Title: Automatic Counting and Classification of Mosquito Eggs in Field Traps
Subjects: Artificial Intelligence (cs.AI)

The analysis of the field traps where the mosquitoes insert their eggs is vital to check that the sterile insect technique (SIT) is working properly. This is because the number of hatched eggs may indicate that the sterile males are not competing with the wild ones. Nowadays, the study of the traps is done manually by microscope and is very time-consuming and prone to human error. This paper presents an automatic trap survey. For this purpose, a device has been designed that automatically scans the slat obtaining different overlapping photos. Subsequently, the images are analyzed by a Mask-RCNN neural network that segments the eggs and classifies them into 2 classes: full or hatch

[160]  arXiv:2405.20657 [pdf, other]
Title: DORY: Deliberative Prompt Recovery for LLM
Subjects: Computation and Language (cs.CL)

Prompt recovery in large language models (LLMs) is crucial for understanding how LLMs work and addressing concerns regarding privacy, copyright, etc. The trend towards inference-only APIs complicates this task by restricting access to essential outputs for recovery. To tackle this challenge, we extract prompt-related information from limited outputs and identify a strong(negative) correlation between output probability-based uncertainty and the success of prompt recovery. This finding led to the development of Deliberative PrOmpt RecoverY (DORY), our novel approach that leverages uncertainty to recover prompts accurately. DORY involves reconstructing drafts from outputs, refining these with hints, and filtering out noise based on uncertainty. Our evaluation across diverse LLMs and prompt benchmarks shows that DORY outperforms existing baselines, improving performance by approximately 10.82% and establishing a new state-of-the-art record in prompt recovery tasks. Significantly, DORY operates using a single LLM without any external resources or model, offering a cost-effective, user-friendly prompt recovery solution.

[161]  arXiv:2405.20661 [pdf, other]
Title: An Overview of Quantum Software Engineering in Latin America
Comments: 27 pages, 9 figures
Subjects: Software Engineering (cs.SE)

Quantum computing represents a revolutionary computational paradigm with the potential to address challenges beyond classical computers' capabilities. The development of robust quantum software is indispensable to unlock the full potential of quantum computing. Like classical software, quantum software is expected to be complex and extensive, needing the establishment of a specialized field known as Quantum Software Engineering. Recognizing the regional focus on Latin America within this special issue, we have boarded on an in-depth inquiry encompassing a systematic mapping study of existing literature and a comprehensive survey of experts in the field. This rigorous research effort aims to illuminate the current landscape of Quantum Software Engineering initiatives undertaken by universities, research institutes, and companies across Latin America. This exhaustive study aims to provide information on the progress, challenges, and opportunities in Quantum Software Engineering in the Latin American context. By promoting a more in-depth understanding of cutting-edge developments in this burgeoning field, our research aims to serve as a potential stimulus to initiate pioneering initiatives and encourage collaborative efforts among Latin American researchers.

[162]  arXiv:2405.20664 [pdf, other]
Title: Weak Robust Compatibility Between Learning Algorithms and Counterfactual Explanation Generation Algorithms
Authors: Ao Xu, Tieru Wu
Subjects: Machine Learning (cs.LG)

Counterfactual explanation generation is a powerful method for Explainable Artificial Intelligence. It can help users understand why machine learning models make specific decisions, and how to change those decisions. Evaluating the robustness of counterfactual explanation algorithms is therefore crucial. Previous literature has widely studied the robustness based on the perturbation of input instances. However, the robustness defined from the perspective of perturbed instances is sometimes biased, because this definition ignores the impact of learning algorithms on robustness. In this paper, we propose a more reasonable definition, Weak Robust Compatibility, based on the perspective of explanation strength. In practice, we propose WRC-Test to help us generate more robust counterfactuals. Meanwhile, we designed experiments to verify the effectiveness of WRC-Test. Theoretically, we introduce the concepts of PAC learning theory and define the concept of PAC WRC-Approximability. Based on reasonable assumptions, we establish oracle inequalities about weak robustness, which gives a sufficient condition for PAC WRC-Approximability.

[163]  arXiv:2405.20666 [pdf, other]
Title: MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
Comments: Accepted by TCSVT 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Sign language recognition (SLR) has long been plagued by insufficient model representation capabilities. Although current pre-training approaches have alleviated this dilemma to some extent and yielded promising performance by employing various pretext tasks on sign pose data, these methods still suffer from two primary limitations: 1) Explicit motion information is usually disregarded in previous pretext tasks, leading to partial information loss and limited representation capability. 2) Previous methods focus on the local context of a sign pose sequence, without incorporating the guidance of the global meaning of lexical signs. To this end, we propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information in a self-supervised learning paradigm for SLR. Our framework contains two crucial components, i.e., a motion-aware masked autoencoder (MA) and a momentum semantic alignment module (SA). Specifically, in MA, we introduce an autoencoder architecture with a motion-aware masked strategy to reconstruct motion residuals of masked frames, thereby explicitly exploring dynamic motion cues among sign pose sequences. Moreover, in SA, we embed our framework with global semantic awareness by aligning the embeddings of different augmented samples from the input sequence in the shared latent space. In this way, our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation. Furthermore, we conduct extensive experiments to validate the effectiveness of our method, achieving new state-of-the-art performance on four public benchmarks.

[164]  arXiv:2405.20669 [pdf, other]
Title: Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Single image-to-3D generation is pivotal for crafting controllable 3D assets. Given its underconstrained nature, we leverage geometric priors from a 3D novel view generation diffusion model and appearance priors from a 2D image generation method to guide the optimization process. We note that a disparity exists between the training datasets of 2D and 3D diffusion models, leading to their outputs showing marked differences in appearance. Specifically, 2D models tend to deliver more detailed visuals, whereas 3D models produce consistent yet over-smooth results across different views. Hence, we optimize a set of 3D Gaussians using 3D priors in spatial domain to ensure geometric consistency, while exploiting 2D priors in the frequency domain through Fourier transform for higher visual quality. This 2D-3D hybrid Fourier Score Distillation objective function (dubbed hy-FSD), can be integrated into existing 3D generation methods, yielding significant performance improvements. With this technique, we further develop an image-to-3D generation pipeline to create high-quality 3D objects within one minute, named Fourier123. Extensive experiments demonstrate that Fourier123 excels in efficient generation with rapid convergence speed and visual-friendly generation results.

[165]  arXiv:2405.20670 [pdf, ps, other]
Title: Twitter should now be referred to as X: How academics, journals and publishers need to make the nomenclatural transition
Subjects: Digital Libraries (cs.DL)

Here, we note how academics, journals and publishers should no longer refer to the social media platform Twitter as such, rather as X. Relying on Google Scholar, we found 16 examples of papers published in the last months of 2023 - essentially during the transition period between Twitter and X - that used Twitter and X, but in different ways. Unlike that transition period in which the binary Twitter/X could have been used in academic papers, we suggest that papers should no longer refer to Twitter as Twitter, but only as X, except for historical studies about that social media platform, because such use would be factually incorrect.

[166]  arXiv:2405.20671 [pdf, other]
Title: Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers
Comments: 73 pages, 20 figures, 90 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absolute position mechanism assigning unique position IDs to each of the tokens, we assign the same position IDs to two or more "relevant" tokens; for integer addition tasks, we regard digits of the same significance as in the same position. On the empirical side, we show that with the proposed position coupling, a small (1-layer) Transformer trained on 1 to 30-digit additions can generalize up to 200-digit additions (6.67x of the trained length). On the theoretical side, we prove that a 1-layer Transformer with coupled positions can solve the addition task involving exponentially many digits, whereas any 1-layer Transformer without positional information cannot entirely solve it. We also demonstrate that position coupling can be applied to other algorithmic tasks such as addition with multiple summands, Nx2 multiplication, copy/reverse, and a two-dimensional task.

[167]  arXiv:2405.20672 [pdf, other]
Title: Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations
Comments: 22 pages, 15 figures (including appendix)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs) with the aim of enhancing the understanding of their underlying mechanisms. Despite numerous defense methods proposed in the literature, there is still an incomplete understanding of this phenomenon. Instead of treating the entire model as vulnerable, we propose that specific feature maps learned during training contribute to the overall vulnerability. To investigate how the hidden representations learned by a CNN affect its vulnerability, we introduce the Adversarial Intervention framework. Experiments were conducted on models trained on three well-known computer vision datasets, subjecting them to attacks of different nature. Our focus centers on the effects that adversarial perturbations to a model's initial layer have on the overall behavior of the model. Empirical results revealed compelling insights: a) perturbing selected channel combinations in shallow layers causes significant disruptions; b) the channel combinations most responsible for the disruptions are common among different types of attacks; c) despite shared vulnerable combinations of channels, different attacks affect hidden representations with varying magnitudes; d) there exists a positive correlation between a kernel's magnitude and its vulnerability. In conclusion, this work introduces a novel framework to study the vulnerability of a CNN model to adversarial perturbations, revealing insights that contribute to a deeper understanding of the phenomenon. The identified properties pave the way for the development of efficient ad-hoc defense mechanisms in future applications.

[168]  arXiv:2405.20674 [pdf, other]
Title: 4Diffusion: Multi-view Video Diffusion Model for 4D Generation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models. However, these methods lack multi-view spatial-temporal modeling and encounter challenges in integrating diverse prior knowledge from multiple diffusion models, resulting in inconsistent temporal appearance and flickers. In this paper, we propose a novel 4D generation pipeline, namely 4Diffusion aimed at generating spatial-temporally consistent 4D content from a monocular video. We first design a unified diffusion model tailored for multi-view video generation by incorporating a learnable motion module into a frozen 3D-aware diffusion model to capture multi-view spatial-temporal correlations. After training on a curated dataset, our diffusion model acquires reasonable temporal consistency and inherently preserves the generalizability and spatial consistency of the 3D-aware diffusion model. Subsequently, we propose 4D-aware Score Distillation Sampling loss, which is based on our multi-view video diffusion model, to optimize 4D representation parameterized by dynamic NeRF. This aims to eliminate discrepancies arising from multiple diffusion models, allowing for generating spatial-temporally consistent 4D content. Moreover, we devise an anchor loss to enhance the appearance details and facilitate the learning of dynamic NeRF. Extensive qualitative and quantitative experiments demonstrate that our method achieves superior performance compared to previous methods.

[169]  arXiv:2405.20675 [pdf, other]
Title: Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling
Comments: 7 pages, 11 figures, ELLIS Doctoral Symposium 2023 in Helsinki, Finland
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models, achieving remarkable performance in image synthesis tasks. However, these models face challenges in terms of widespread adoption due to their reliance on sequential denoising steps during sample generation. This dependence leads to substantial computational requirements, making them unsuitable for resource-constrained or real-time processing systems. To address these challenges, we propose a novel method that integrates denoising phases directly into the model's architecture, thereby reducing the need for resource-intensive computations. Our approach combines diffusion models with generative adversarial networks (GANs) through knowledge distillation, enabling more efficient training and evaluation. By utilizing a pre-trained diffusion model as a teacher model, we train a student model through adversarial learning, employing layerwise transformations for denoising and submodules for predicting the teacher model's output at various points in time. This integration significantly reduces the number of parameters and denoising steps required, leading to improved sampling speed at test time. We validate our method with extensive experiments, demonstrating comparable performance with reduced computational requirements compared to existing approaches. By enabling the deployment of diffusion models on resource-constrained devices, our research mitigates their computational burden and paves the way for wider accessibility and practical use across the research community and end-users.
Our code is publicly available at https://github.com/kidist-amde/Adv-KD

[170]  arXiv:2405.20677 [pdf, other]
Title: Provably Efficient Interactive-Grounded Learning with Personalized Reward
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with context-dependent feedback, but their algorithm does not come with theoretical guarantees. In this work, we consider the same problem and provide the first provably efficient algorithms with sublinear regret under realizability. Our analysis reveals that the step-function estimator of prior work can deviate uncontrollably due to finite-sample effects. Our solution is a novel Lipschitz reward estimator which underestimates the true reward and enjoys favorable generalization performances. Building on this estimator, we propose two algorithms, one based on explore-then-exploit and the other based on inverse-gap weighting. We apply IGL to learning from image feedback and learning from text feedback, which are reward-free settings that arise in practice. Experimental results showcase the importance of using our Lipschitz reward estimator and the overall effectiveness of our algorithms.

[171]  arXiv:2405.20678 [pdf, ps, other]
Title: No-Regret Learning for Fair Multi-Agent Social Welfare Optimization
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that $\sqrt{T}$-regret is possible after $T$ rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. Specifically, in stochastic $N$-agent $K$-armed bandits, we develop an algorithm with $\widetilde{\mathcal{O}}\left(K^{\frac{2}{N}}T^{\frac{N-1}{N}}\right)$ regret and prove that the dependence on $T$ is tight, making it a sharp contrast to the $\sqrt{T}$-regret bounds of Hossain et al. [2021], Jones et al. [2023]. We then consider a more challenging version of the problem with adversarial rewards. Somewhat surprisingly, despite NSW being a concave function, we prove that no algorithm can achieve sublinear regret. To circumvent such negative results, we further consider a setting with full-information feedback and design two algorithms with $\sqrt{T}$-regret: the first one has no dependence on $N$ at all and is applicable to not just NSW but a broad class of welfare functions, while the second one has better dependence on $K$ and is preferable when $N$ is small. Finally, we also show that logarithmic regret is possible whenever there exists one agent who is indifferent about different arms.

[172]  arXiv:2405.20680 [pdf, other]
Title: Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
Comments: ACL 2024 (findings)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Although Retrieval-Augmented Large Language Models (RALMs) demonstrate their superiority in terms of factuality, they do not consistently outperform the original retrieval-free Language Models (LMs). Our experiments reveal that this example-level performance inconsistency exists not only between retrieval-augmented and retrieval-free LM but also among different retrievers. To understand this phenomenon, we investigate the degeneration behavior of RALMs and theoretically decompose it into four categories. Further analysis based on our decomposition reveals that the innate difference in knowledge sources and the unpredictable degeneration of the reader model contribute most to the inconsistency. Drawing from our analysis, we introduce Ensemble of Retrievers (EoR), a trainable framework that can adaptively retrieve from different knowledge sources and effectively decrease unpredictable reader errors. Our experiments on Open Domain Question Answering show that EoR substantially improves performance over the RALM with a single retriever by considerably reducing inconsistent behaviors.

[173]  arXiv:2405.20681 [pdf, other]
Title: No Free Lunch Theorem for Privacy-Preserving LLM Inference
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the frontiers of technology and science. However, LLMs also pose privacy concerns. Users' interactions with LLMs may expose their sensitive personal or company information. A lack of robust privacy safeguards and legal frameworks could permit the unwarranted intrusion or improper handling of individual data, thereby risking infringements of privacy and the theft of personal identities. To ensure privacy, it is essential to minimize the dependency between shared prompts and private information. Various randomization approaches have been proposed to protect prompts' privacy, but they may incur utility loss compared to unprotected LLMs prompting. Therefore, it is essential to evaluate the balance between the risk of privacy leakage and loss of utility when conducting effective protection mechanisms. The current study develops a framework for inferring privacy-protected Large Language Models (LLMs) and lays down a solid theoretical basis for examining the interplay between privacy preservation and utility. The core insight is encapsulated within a theorem that is called as the NFL (abbreviation of the word No-Free-Lunch) Theorem.

[174]  arXiv:2405.20682 [pdf, other]
Title: Impact of Phase Selection on Accuracy and Scalability in Calculating Distributed Energy Resources Hosting Capacity
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Hosting capacity (HC) and dynamic operating envelopes (DOEs), defined as dynamic, time-varying HC, are calculated using three-phase optimal power flow (OPF) formulations. Due to the computational complexity of such optimisation problems, HC and DOE are often calculated by introducing certain assumptions and approximations, including the linearised OPF formulation, which we implement in the Python-based tool ppOPF. Furthermore, we investigate how assumptions of the distributed energy resource (DER) connection phase impact the objective function value and computational time in calculating HC and DOE in distribution networks of different sizes. The results are not unambiguous and show that it is not possible to determine the optimal connection phase without introducing binary variables since, no matter the case study, the highest objective function values are calculated with mixed integer OPF formulations. The difference is especially visible in a real-world low-voltage network in which the difference between different scenarios is up to 14 MW in a single day. However, binary variables make the problem computationally complex and increase computational time to several hours in the DOE calculation, even when the optimality gap different from zero is set.

[175]  arXiv:2405.20684 [pdf, other]
Title: Joint Embeddings for Graph Instruction Tuning
Comments: Conference Preprint
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)

Large Language Models (LLMs) have achieved impressive performance in text understanding and have become an essential tool for building smart assistants. Originally focusing on text, they have been enhanced with multimodal capabilities in recent works that successfully built visual instruction following assistants. As far as the graph modality goes, however, no such assistants have yet been developed. Graph structures are complex in that they represent relation between different features and are permutation invariant. Moreover, representing them in purely textual form does not always lead to good LLM performance even for finetuned models. As a result, there is a need to develop a new method to integrate graphs in LLMs for general graph understanding. This work explores the integration of the graph modality in LLM for general graph instruction following tasks. It aims at producing a deep learning model that enhances an underlying LLM with graph embeddings and trains it to understand them and to produce, given an instruction, an answer grounded in the graph representation. The approach performs significantly better than a graph to text approach and remains consistent even for larger graphs.

[176]  arXiv:2405.20685 [pdf, other]
Title: Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of image classification models, effective image counterfactual explanations can significantly enhance user understanding. This paper introduces a novel method for computing feature importance within the feature space of a black-box model. By employing information fusion techniques, our method maximizes the use of data to address feature counterfactual explanations in the feature space. Subsequently, we utilize an image generation model to transform these feature counterfactual explanations into image counterfactual explanations. Our experiments demonstrate that the counterfactual explanations generated by our method closely resemble the original images in both pixel and feature spaces. Additionally, our method outperforms established baselines, achieving impressive experimental results.

[177]  arXiv:2405.20687 [pdf, other]
Title: Conditioning GAN Without Training Dataset
Comments: 5 pages, 2 figures, Part of my MSc project course, School Project Course 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

Deep learning algorithms have a large number of trainable parameters often with sizes of hundreds of thousands or more. Training this algorithm requires a large amount of training data and generating a sufficiently large dataset for these algorithms is costly\cite{noguchi2019image}.
GANs are generative neural networks that use two deep learning networks that are competing with each other. The networks are generator and discriminator networks. The generator tries to generate realistic images which resemble the actual training dataset by approximating the training data distribution and the discriminator is trained to classify images as real or fake(generated)\cite{goodfellow2016nips}. Training these GAN algorithms also requires a large amount of training dataset\cite{noguchi2019image}.
In this study, the aim is to address the question, "Given an unconditioned pretrained generator network and a pretrained classifier, is it feasible to develop a conditioned generator without relying on any training dataset?"
The paper begins with a general introduction to the problem. The subsequent sections are structured as follows: Section 2 provides background information on the problem. Section 3 reviews relevant literature on the topic. Section 4 outlines the methodology employed in this study. Section 5 presents the experimental results. Section 6 discusses the findings and proposes potential future research directions. Finally, Section 7 offers concluding remarks.
The implementation can be accessed \href{https://github.com/kidist-amde/BigGAN-PyTorch}{here}.

[178]  arXiv:2405.20690 [pdf, other]
Title: Unleashing the Potential of Diffusion Models for Incomplete Data Imputation
Subjects: Machine Learning (cs.LG)

This paper introduces DiffPuter, an iterative method for missing data imputation that leverages the Expectation-Maximization (EM) algorithm and Diffusion Models. By treating missing data as hidden variables that can be updated during model training, we frame the missing data imputation task as an EM problem. During the M-step, DiffPuter employs a diffusion model to learn the joint distribution of both the observed and currently estimated missing data. In the E-step, DiffPuter re-estimates the missing data based on the conditional probability given the observed data, utilizing the diffusion model learned in the M-step. Starting with an initial imputation, DiffPuter alternates between the M-step and E-step until convergence. Through this iterative process, DiffPuter progressively refines the complete data distribution, yielding increasingly accurate estimations of the missing data. Our theoretical analysis demonstrates that the unconditional training and conditional sampling processes of the diffusion model align precisely with the objectives of the M-step and E-step, respectively. Empirical evaluations across 10 diverse datasets and comparisons with 16 different imputation methods highlight DiffPuter's superior performance. Notably, DiffPuter achieves an average improvement of 8.10% in MAE and 5.64% in RMSE compared to the most competitive existing method.

[179]  arXiv:2405.20692 [pdf, other]
Title: In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates, current works still suffer from high computational costs when the across-episodic sequence increases with task horizons. To this end, we propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. Specifically, IDT is inspired by the efficient hierarchical structure of human decision-making and thus reconstructs the sequence to consist of high-level decisions instead of low-level actions that interact with environments. As one high-level decision can guide multi-step low-level actions, IDT naturally avoids excessively long sequences and solves online tasks more efficiently. Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is \textbf{36$\times$} times faster than baselines in the D4RL benchmark and \textbf{27$\times$} times faster in the Grid World benchmark.

[180]  arXiv:2405.20694 [pdf, other]
Title: Robust Stable Spiking Neural Networks
Comments: Accepted by ICML2024
Subjects: Neural and Evolutionary Computing (cs.NE)

Spiking neural networks (SNNs) are gaining popularity in deep learning due to their low energy budget on neuromorphic hardware. However, they still face challenges in lacking sufficient robustness to guard safety-critical applications such as autonomous driving. Many studies have been conducted to defend SNNs from the threat of adversarial attacks. This paper aims to uncover the robustness of SNN through the lens of the stability of nonlinear systems. We are inspired by the fact that searching for parameters altering the leaky integrate-and-fire dynamics can enhance their robustness. Thus, we dive into the dynamics of membrane potential perturbation and simplify the formulation of the dynamics. We present that membrane potential perturbation dynamics can reliably convey the intensity of perturbation. Our theoretical analyses imply that the simplified perturbation dynamics satisfy input-output stability. Thus, we propose a training framework with modified SNN neurons and to reduce the mean square of membrane potential perturbation aiming at enhancing the robustness of SNN. Finally, we experimentally verify the effectiveness of the framework in the setting of Gaussian noise training and adversarial training on the image classification task.

[181]  arXiv:2405.20697 [pdf, other]
Title: A Lightweight Method for Defending Against UAF Vulnerabilities
Authors: Xun An
Subjects: Cryptography and Security (cs.CR)

The widespread presence of Use-After-Free (UAF) vulnerabilities poses a serious threat to software security, with dangling pointers being considered the primary cause of these vulnerabilities. However, existing methods for defending against UAF vulnerabilities by eliminating dangling pointers need to interrupt the program's execution when encountering pointer assignment operations to look up the objects pointed to by the pointers and store the memory addresses of the pointers in a specific data structure. This makes these methods not lightweight. To overcome this drawback, we propose a novel approach called LightDE. This method does not require storing the memory addresses of pointers or locating the objects pointed to by pointers during program execution. LightDE uses our proposed structure-sensitive pointer analysis method to determine the objects pointed to by pointers and stores the pointing relationships in the program's data segment during program compilation. Since LightDE only needs to check whether the pointers identified by the pointer analysis point to the released objects when the objects are released, LightDE is very lightweight. Our experimental results show that LightDE can effectively defend against UAF vulnerabilities, and the additional performance overhead it introduces is very low.

[182]  arXiv:2405.20700 [pdf, other]
Title: Self-degraded contrastive domain adaptation for industrial fault diagnosis with bi-imbalanced data
Subjects: Artificial Intelligence (cs.AI)

Modern industrial fault diagnosis tasks often face the combined challenge of distribution discrepancy and bi-imbalance. Existing domain adaptation approaches pay little attention to the prevailing bi-imbalance, leading to poor domain adaptation performance or even negative transfer. In this work, we propose a self-degraded contrastive domain adaptation (Sd-CDA) diagnosis framework to handle the domain discrepancy under the bi-imbalanced data. It first pre-trains the feature extractor via imbalance-aware contrastive learning based on model pruning to learn the feature representation efficiently in a self-supervised manner. Then it forces the samples away from the domain boundary based on supervised contrastive domain adversarial learning (SupCon-DA) and ensures the features generated by the feature extractor are discriminative enough. Furthermore, we propose the pruned contrastive domain adversarial learning (PSupCon-DA) to pay automatically re-weighted attention to the minorities to enhance the performance towards bi-imbalanced data. We show the superiority of the proposed method via two experiments.

[183]  arXiv:2405.20701 [pdf, other]
Title: Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to humans. By providing models with neighborhood instructions, which are closely situated in the latent representation space and differ by only one semantically similar word, the performance on downstream tasks can be vastly different. Following this property, we propose a black-box Combinatorial Optimization framework for Prompt Lexical Enhancement (COPLE). COPLE performs iterative lexical optimization according to the feedback from a batch of proxy tasks, using a search strategy related to word influence. Experiments show that even widely-used human-crafted prompts for current benchmarks suffer from the lexical sensitivity of models, and COPLE recovers the declined model ability in both instruct-following and solving downstream tasks.

[184]  arXiv:2405.20703 [pdf, other]
Title: It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis Performance
Comments: Accepted to ACL Findings 2024
Subjects: Computation and Language (cs.CL)

Aspect-Based Sentiment Analysis (ABSA) involves extracting opinions from textual data about specific entities and their corresponding aspects through various complementary subtasks. Several prior research has focused on developing ad hoc designs of varying complexities for these subtasks. In this paper, we present a generative framework extensible to any ABSA subtask. We build upon the instruction tuned model proposed by Scaria et al. (2023), who present an instruction-based model with task descriptions followed by in-context examples on ABSA subtasks. We propose PFInstruct, an extension to this instruction learning paradigm by appending an NLP-related task prefix to the task description. This simple approach leads to improved performance across all tested SemEval subtasks, surpassing previous state-of-the-art (SOTA) on the ATE subtask (Rest14) by +3.28 F1-score, and on the AOOE subtask by an average of +5.43 F1-score across SemEval datasets. Furthermore, we explore the impact of the prefix-enhanced prompt quality on the ABSA subtasks and find that even a noisy prefix enhances model performance compared to the baseline. Our method also achieves competitive results on a biomedical domain dataset (ERSA).

[185]  arXiv:2405.20704 [pdf, other]
Title: A flexible numerical tool for large dynamic DC networks
Comments: 17 pages, 5 figures, 3 tables. First version, all comments are welcome
Subjects: Systems and Control (eess.SY)

DC networks play an important role within the ongoing energy transition. In this context, simulations of designed and existing networks and their corresponding assets are a core tool to get insights and form a support to decision-making. Hereby, these simulations of DC networks are executed in the time domain. Due to the involved high frequencies and the used controllers, the equations that model these DC networks are stiff and highly oscillatory differential equations. By exploiting sparsity, we show that conventional adaptive time stepping schemes can be used efficiently for the time domain simulation of very large DC networks and that this scales linearly in the computational cost as the size of the networks increase.

[186]  arXiv:2405.20705 [pdf, other]
Title: ADESSE: Advice Explanations in Complex Repeated Decision-Making Environments
Subjects: Artificial Intelligence (cs.AI)

In the evolving landscape of human-centered AI, fostering a synergistic relationship between humans and AI agents in decision-making processes stands as a paramount challenge. This work considers a problem setup where an intelligent agent comprising a neural network-based prediction component and a deep reinforcement learning component provides advice to a human decision-maker in complex repeated decision-making environments. Whether the human decision-maker would follow the agent's advice depends on their beliefs and trust in the agent and on their understanding of the advice itself. To this end, we developed an approach named ADESSE to generate explanations about the adviser agent to improve human trust and decision-making. Computational experiments on a range of environments with varying model sizes demonstrate the applicability and scalability of ADESSE. Furthermore, an interactive game-based user study shows that participants were significantly more satisfied, achieved a higher reward in the game, and took less time to select an action when presented with explanations generated by ADESSE. These findings illuminate the critical role of tailored, human-centered explanations in AI-assisted decision-making.

[187]  arXiv:2405.20708 [pdf, other]
Title: FinGen: A Dataset for Argument Generation in Finance
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Thinking about the future is one of the important activities that people do in daily life. Futurists also pay a lot of effort into figuring out possible scenarios for the future. We argue that the exploration of this direction is still in an early stage in the NLP research. To this end, we propose three argument generation tasks in the financial application scenario. Our experimental results show these tasks are still big challenges for representative generation models. Based on our empirical results, we further point out several unresolved issues and challenges in this research direction.

[188]  arXiv:2405.20710 [pdf, other]
Title: Information Maximization via Variational Autoencoders for Cross-Domain Recommendation
Subjects: Information Retrieval (cs.IR)

Cross-Domain Sequential Recommendation (CDSR) methods aim to address the data sparsity and cold-start problems present in Single-Domain Sequential Recommendation (SDSR). Existing CDSR methods typically rely on overlapping users, designing complex cross-domain modules to capture users' latent interests that can propagate across different domains. However, their propagated informative information is limited to the overlapping users and the users who have rich historical behavior records. As a result, these methods often underperform in real-world scenarios, where most users are non-overlapping (cold-start) and long-tailed. In this research, we introduce a new CDSR framework named Information Maximization Variational Autoencoder (\textbf{\texttt{IM-VAE}}). Here, we suggest using a Pseudo-Sequence Generator to enhance the user's interaction history input for downstream fine-grained CDSR models to alleviate the cold-start issues. We also propose a Generative Recommendation Framework combined with three regularizers inspired by the mutual information maximization (MIM) theory \cite{mcgill1954multivariate} to capture the semantic differences between a user's interests shared across domains and those specific to certain domains, as well as address the informational gap between a user's actual interaction sequences and the pseudo-sequences generated. To the best of our knowledge, this paper is the first CDSR work that considers the information disentanglement and denoising of pseudo-sequences in the open-world recommendation scenario. Empirical experiments illustrate that \texttt{IM-VAE} outperforms the state-of-the-art approaches on two real-world cross-domain datasets on all sorts of users, including cold-start and tailed users, demonstrating the effectiveness of \texttt{IM-VAE} in open-world recommendation.

[189]  arXiv:2405.20711 [pdf, other]
Title: Revisiting Mutual Information Maximization for Generalized Category Discovery
Comments: Preprint version
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generalized category discovery presents a challenge in a realistic scenario, which requires the model's generalization ability to recognize unlabeled samples from known and unknown categories. This paper revisits the challenge of generalized category discovery through the lens of information maximization (InfoMax) with a probabilistic parametric classifier. Our findings reveal that ensuring independence between known and unknown classes while concurrently assuming a uniform probability distribution across all classes, yields an enlarged margin among known and unknown classes that promotes the model's performance. To achieve the aforementioned independence, we propose a novel InfoMax-based method, Regularized Parametric InfoMax (RPIM), which adopts pseudo labels to supervise unlabeled samples during InfoMax, while proposing a regularization to ensure the quality of the pseudo labels. Additionally, we introduce novel semantic-bias transformation to refine the features from the pre-trained model instead of direct fine-tuning to rescue the computational costs. Extensive experiments on six benchmark datasets validate the effectiveness of our method. RPIM significantly improves the performance regarding unknown classes, surpassing the state-of-the-art method by an average margin of 3.5%.

[190]  arXiv:2405.20713 [pdf, ps, other]
Title: Fast Evaluation of S-boxes with Garbled Circuits
Comments: 15 pages, published in IEEE Transactions on Information Forensics and Security vol. 19
Journal-ref: IEEE Transactions on Information Forensics and Security, vol. 19, pp. 5530-5544, 2024.
Subjects: Cryptography and Security (cs.CR)

Garbling schemes are vital primitives for privacy-preserving protocols and secure two-party computation. This paper presents a projective garbling scheme that assigns $2^n$ values to wires in a circuit comprising XOR and unary projection gates. A generalization of FreeXOR allows the XOR of wires with $2^n$ values to be very efficient. We then analyze the performance of our scheme by evaluating substitution-permutation ciphers. Using our proposal, we measure high-speed evaluation of the ciphers with a moderately increased cost in garbling and bandwidth. Theoretical analysis suggests that for evaluating the nine examined ciphers, one can expect a 4- to 70-fold improvement in evaluation performance with, at most, a 4-fold increase in garbling cost and, at most, an 8-fold increase in communication cost compared to the Half-Gates (Zahur, Rosulek and Evans; Eurocrypt'15) and ThreeHalves (Rosulek and Roy; Crypto'21) garbling schemes. In an offline/online setting, such as secure function evaluation as a service, the circuit garbling and communication to the evaluator can proceed in the offline phase. Thus, our scheme offers a fast online phase. Furthermore, we present efficient Boolean circuits for the S-boxes of TWINE and Midori64 ciphers. To our knowledge, our formulas give the smallest number of AND gates for the S-boxes of these two ciphers.

[191]  arXiv:2405.20715 [pdf, other]
Title: Transforming Japan Real Estate
Authors: Diabul Haque
Subjects: Computational Engineering, Finance, and Science (cs.CE); Econometrics (econ.EM); Statistical Finance (q-fin.ST)

The Japanese real estate market, valued over 35 trillion USD, offers significant investment opportunities. Accurate rent and price forecasting could provide a substantial competitive edge. This paper explores using alternative data variables to predict real estate performance in 1100 Japanese municipalities. A comprehensive house price index was created, covering all municipalities from 2005 to the present, using a dataset of over 5 million transactions. This core dataset was enriched with economic factors spanning decades, allowing for price trajectory predictions.
The findings show that alternative data variables can indeed forecast real estate performance effectively. Investment signals based on these variables yielded notable returns with low volatility. For example, the net migration ratio delivered an annualized return of 4.6% with a Sharpe ratio of 1.5. Taxable income growth and new dwellings ratio also performed well, with annualized returns of 4.1% (Sharpe ratio of 1.3) and 3.3% (Sharpe ratio of 0.9), respectively. When combined with transformer models to predict risk-adjusted returns 4 years in advance, the model achieved an R-squared score of 0.28, explaining nearly 30% of the variation in future municipality prices.
These results highlight the potential of alternative data variables in real estate investment. They underscore the need for further research to identify more predictive factors. Nonetheless, the evidence suggests that such data can provide valuable insights into real estate price drivers, enabling more informed investment decisions in the Japanese market.

[192]  arXiv:2405.20717 [pdf, other]
Title: Cyclic image generation using chaotic dynamics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)

Successive image generation using cyclic transformations is demonstrated by extending the CycleGAN model to transform images among three different categories. Repeated application of the trained generators produces sequences of images that transition among the different categories. The generated image sequences occupy a more limited region of the image space compared with the original training dataset. Quantitative evaluation using precision and recall metrics indicates that the generated images have high quality but reduced diversity relative to the training dataset. Such successive generation processes are characterized as chaotic dynamics in terms of dynamical system theory. Positive Lyapunov exponents estimated from the generated trajectories confirm the presence of chaotic dynamics, with the Lyapunov dimension of the attractor found to be comparable to the intrinsic dimension of the training data manifold. The results suggest that chaotic dynamics in the image space defined by the deep generative model contribute to the diversity of the generated images, constituting a novel approach for multi-class image generation. This model can be interpreted as an extension of classical associative memory to perform hetero-association among image categories.

[193]  arXiv:2405.20718 [pdf, other]
Title: Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias
Comments: Accepted by KDD 2024
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Collaborative Filtering (CF) typically suffers from the significant challenge of popularity bias due to the uneven distribution of items in real-world datasets. This bias leads to a significant accuracy gap between popular and unpopular items. It not only hinders accurate user preference understanding but also exacerbates the Matthew effect in recommendation systems. To alleviate popularity bias, existing efforts focus on emphasizing unpopular items or separating the correlation between item representations and their popularity. Despite the effectiveness, existing works still face two persistent challenges: (1) how to extract common supervision signals from popular items to improve the unpopular item representations, and (2) how to alleviate the representation separation caused by popularity bias. In this work, we conduct an empirical analysis of popularity bias and propose Popularity-Aware Alignment and Contrast (PAAC) to address two challenges. Specifically, we use the common supervisory signals modeled in popular item representations and propose a novel popularity-aware supervised alignment module to learn unpopular item representations. Additionally, we suggest re-weighting the contrastive learning loss to mitigate the representation separation from a popularity-centric perspective. Finally, we validate the effectiveness and rationale of PAAC in mitigating popularity bias through extensive experiments on three real-world datasets. Our code is available at https://github.com/miaomiao-cai2/KDD2024-PAAC.

[194]  arXiv:2405.20719 [pdf, other]
Title: Climate Variable Downscaling with Conditional Normalizing Flows
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)

Predictions of global climate models typically operate on coarse spatial scales due to the large computational costs of climate simulations. This has led to a considerable interest in methods for statistical downscaling, a similar process to super-resolution in the computer vision context, to provide more local and regional climate information. In this work, we apply conditional normalizing flows to the task of climate variable downscaling. We showcase its successful performance on an ERA5 water content dataset for different upsampling factors. Additionally, we show that the method allows us to assess the predictive uncertainty in terms of standard deviation from the fitted conditional distribution mean.

[195]  arXiv:2405.20720 [pdf, other]
Title: Power of Cooperative Supervision: Multiple Teachers Framework for Enhanced 3D Semi-Supervised Object Detection
Comments: under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

To ensure safe urban driving for autonomous platforms, it is crucial not only to develop high-performance object detection techniques but also to establish a diverse and representative dataset that captures various urban environments and object characteristics. To address these two issues, we have constructed a multi-class 3D LiDAR dataset reflecting diverse urban environments and object characteristics, and developed a robust 3D semi-supervised object detection (SSOD) based on a multiple teachers framework. This SSOD framework categorizes similar classes and assigns specialized teachers to each category. Through collaborative supervision among these category-specialized teachers, the student network becomes increasingly proficient, leading to a highly effective object detector. We propose a simple yet effective augmentation technique, Pie-based Point Compensating Augmentation (PieAug), to enable the teacher network to generate high-quality pseudo-labels. Extensive experiments on the WOD, KITTI, and our datasets validate the effectiveness of our proposed method and the quality of our dataset. Experimental results demonstrate that our approach consistently outperforms existing state-of-the-art 3D semi-supervised object detection methods across all datasets. We plan to release our multi-class LiDAR dataset and the source code available on our Github repository in the near future.

[196]  arXiv:2405.20721 [pdf, other]
Title: ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recently, 3D Gaussian Splatting (3DGS) has become a promising framework for novel view synthesis, offering fast rendering speeds and high fidelity. However, the large number of Gaussians and their associated attributes require effective compression techniques. Existing methods primarily compress neural Gaussians individually and independently, i.e., coding all the neural Gaussians at the same time, with little design for their interactions and spatial dependence. Inspired by the effectiveness of the context model in image compression, we propose the first autoregressive model at the anchor level for 3DGS compression in this work. We divide anchors into different levels and the anchors that are not coded yet can be predicted based on the already coded ones in all the coarser levels, leading to more accurate modeling and higher coding efficiency. To further improve the efficiency of entropy coding, e.g., to code the coarsest level with no already coded anchors, we propose to introduce a low-dimensional quantized feature as the hyperprior for each anchor, which can be effectively compressed. Our work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS, while achieving comparable or even higher rendering quality.

[197]  arXiv:2405.20722 [pdf, other]
Title: Formal Verification of Ecosystem Restoration Requirements using UML and Alloy
Journal-ref: 4th International Conference on Advances in Software Engineering, Volume 13, Number 16, 2023
Subjects: Software Engineering (cs.SE)

United Nations have declared the current decade (2021-2030) as the "UN Decade on Ecosystem Restoration" to join R\&D forces to fight against the ongoing environmental crisis. Given the ongoing degradation of earth ecosystems and the related crucial services that they offer to the human society, ecosystem restoration has become a major society-critical issue. It is required to develop rigorously software applications managing ecosystem restoration. Reliable models of ecosystems and restoration goals are necessary. This paper proposes a rigorous approach for ecosystem requirements modeling using formal methods from a model-driven software engineering point of view. The authors describe the main concepts at stake with a metamodel in UML and introduce a formalization of this metamodel in Alloy. The formal model is executed with Alloy Analyzer, and safety and liveness properties are checked against it. This approach helps ensuring that ecosystem specifications are reliable and that the specified ecosystem meets the desired restoration goals, seen in our approach as liveness and safety properties. The concepts and activities of the approach are illustrated with CRESTO, a real-world running example of a restored Costa Rican ecosystem.

[198]  arXiv:2405.20724 [pdf, other]
Title: Learning on Large Graphs using Intersecting Communities
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Message Passing Neural Networks (MPNNs) are a staple of graph machine learning. MPNNs iteratively update each node's representation in an input graph by aggregating messages from the node's neighbors, which necessitates a memory complexity of the order of the number of graph edges. This complexity might quickly become prohibitive for large graphs provided they are not very sparse. In this paper, we propose a novel approach to alleviate this problem by approximating the input graph as an intersecting community graph (ICG) -- a combination of intersecting cliques. The key insight is that the number of communities required to approximate a graph does not depend on the graph size. We develop a new constructive version of the Weak Graph Regularity Lemma to efficiently construct an approximating ICG for any input graph. We then devise an efficient graph learning algorithm operating directly on ICG in linear memory and time with respect to the number of nodes (rather than edges). This offers a new and fundamentally different pipeline for learning on very large non-sparse graphs, whose applicability is demonstrated empirically on node classification tasks and spatio-temporal data processing.

[199]  arXiv:2405.20725 [pdf, other]
Title: GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients and have raised considerable privacy concerns. A majority of gradient inversion methods rely heavily on explicit prior knowledge (e.g., a well pre-trained generative model), which is often unavailable in realistic scenarios. To alleviate this issue, researchers have proposed to leverage the implicit prior knowledge of an over-parameterized network. However, they only utilize a fixed neural architecture for all the attack settings. This would hinder the adaptive use of implicit architectural priors and consequently limit the generalizability. In this paper, we further exploit such implicit prior knowledge by proposing Gradient Inversion via Neural Architecture Search (GI-NAS), which adaptively searches the network and captures the implicit priors behind neural architectures. Extensive experiments verify that our proposed GI-NAS can achieve superior attack performance compared to state-of-the-art gradient inversion methods, even under more practical settings with high-resolution images, large-sized batches, and advanced defense strategies.

[200]  arXiv:2405.20727 [pdf, other]
Title: GANcrop: A Contrastive Defense Against Backdoor Attacks in Federated Learning
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

With heightened awareness of data privacy protection, Federated Learning (FL) has attracted widespread attention as a privacy-preserving distributed machine learning method. However, the distributed nature of federated learning also provides opportunities for backdoor attacks, where attackers can guide the model to produce incorrect predictions without affecting the global model training process.
This paper introduces a novel defense mechanism against backdoor attacks in federated learning, named GANcrop. This approach leverages contrastive learning to deeply explore the disparities between malicious and benign models for attack identification, followed by the utilization of Generative Adversarial Networks (GAN) to recover backdoor triggers and implement targeted mitigation strategies. Experimental findings demonstrate that GANcrop effectively safeguards against backdoor attacks, particularly in non-IID scenarios, while maintaining satisfactory model accuracy, showcasing its remarkable defensive efficacy and practical utility.

[201]  arXiv:2405.20729 [pdf, other]
Title: Extreme Point Supervised Instance Segmentation
Comments: CVPR 2024 Accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a novel approach to learning instance segmentation using extreme points, i.e., the topmost, leftmost, bottommost, and rightmost points, of each object. These points are readily available in the modern bounding box annotation process while offering strong clues for precise segmentation, and thus allows to improve performance at the same annotation cost with box-supervised methods. Our work considers extreme points as a part of the true instance mask and propagates them to identify potential foreground and background points, which are all together used for training a pseudo label generator. Then pseudo labels given by the generator are in turn used for supervised learning of our final model. On three public benchmarks, our method significantly outperforms existing box-supervised methods, further narrowing the gap with its fully supervised counterpart. In particular, our model generates high-quality masks when a target object is separated into multiple parts, where previous box-supervised methods often fail.

[202]  arXiv:2405.20731 [pdf, other]
Title: Maximum Temperature Prediction Using Remote Sensing Data Via Convolutional Neural Network
Comments: 4 pages, submitted to IEEE MetroLivEnv 2024 conference
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Urban heat islands, defined as specific zones exhibiting substantially higher temperatures than their immediate environs, pose significant threats to environmental sustainability and public health. This study introduces a novel machine-learning model that amalgamates data from the Sentinel-3 satellite, meteorological predictions, and additional remote sensing inputs. The primary aim is to generate detailed spatiotemporal maps that forecast the peak temperatures within a 24-hour period in Turin. Experimental results validate the model's proficiency in predicting temperature patterns, achieving a Mean Absolute Error (MAE) of 2.09 degrees Celsius for the year 2023 at a resolution of 20 meters per pixel, thereby enriching our knowledge of urban climatic behavior. This investigation enhances the understanding of urban microclimates, emphasizing the importance of cross-disciplinary data integration, and laying the groundwork for informed policy-making aimed at alleviating the negative impacts of extreme urban temperatures.

[203]  arXiv:2405.20733 [pdf, other]
Title: Dynamic Microgrid Formation Considering Time-dependent Contingency: A Distributionally Robust Approach
Comments: 5 pages, 5 figures, Accepted by PES General Meeting 2024
Subjects: Systems and Control (eess.SY)

The increasing frequency of extreme weather events has posed significant risks to the operation of power grids. During long-duration extreme weather events, microgrid formation (MF) is an essential solution to enhance the resilience of the distribution systems by proactively partitioning the distribution system into several microgrids to mitigate the impact of contingencies. This paper proposes a distributionally robust dynamic microgrid formation (DR-DMF) approach to fully consider the temporal characteristics of line failure probability during long-duration extreme weather events like typhoons. The boundaries of each microgrid are dynamically adjusted to enhance the resilience of the system. Furthermore, the expected load shedding is minimized by a distributionally robust optimization model considering the uncertainty of line failure probability regarding the worst-case distribution of contingencies. The effectiveness of the proposed model is verified by numerical simulations on a modified IEEE 37-node system.

[204]  arXiv:2405.20735 [pdf, other]
Title: Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images
Comments: $\copyright$ 2024 IEEE. Accepted in 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-language models have emerged as a powerful tool for previously challenging multi-modal classification problem in the medical domain. This development has led to the exploration of automated image description generation for multi-modal clinical scans, particularly for radiology report generation. Existing research has focused on clinical descriptions for specific modalities or body regions, leaving a gap for a model providing entire-body multi-modal descriptions. In this paper, we address this gap by automating the generation of standardized body station(s) and list of organ(s) across the whole body in multi-modal MR and CT radiological images. Leveraging the versatility of the Contrastive Language-Image Pre-training (CLIP), we refine and augment the existing approach through multiple experiments, including baseline model fine-tuning, adding station(s) as a superset for better correlation between organs, along with image and language augmentations. Our proposed approach demonstrates 47.6% performance improvement over baseline PubMedCLIP.

[205]  arXiv:2405.20738 [pdf, other]
Title: Federated Random Forest for Partially Overlapping Clinical Data
Subjects: Machine Learning (cs.LG)

In the healthcare sector, a consciousness surrounding data privacy and corresponding data protection regulations, as well as heterogeneous and non-harmonized data, pose huge challenges to large-scale data analysis. Moreover, clinical data often involves partially overlapping features, as some observations may be missing due to various reasons, such as differences in procedures, diagnostic tests, or other recorded patient history information across hospitals or institutes. To address the challenges posed by partially overlapping features and incomplete data in clinical datasets, a comprehensive approach is required. Particularly in the domain of medical data, promising outcomes are achieved by federated random forests whenever features align. However, for most standard algorithms, like random forest, it is essential that all data sets have identical parameters. Therefore, in this work the concept of federated random forest is adapted to a setting with partially overlapping features. Moreover, our research assesses the effectiveness of the newly developed federated random forest models for partially overlapping clinical data. For aggregating the federated, globally optimized model, only features available locally at each site can be used. We tackled two issues in federation: (i) the quantity of involved parties, (ii) the varying overlap of features. This evaluation was conducted across three clinical datasets. The federated random forest model even in cases where only a subset of features overlaps consistently demonstrates superior performance compared to its local counterpart. This holds true across various scenarios, including datasets with imbalanced classes. Consequently, federated random forests for partially overlapped data offer a promising solution to transcend barriers in collaborative research and corporate cooperation.

[206]  arXiv:2405.20743 [pdf, other]
Title: Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes
Comments: 15 pages, 3 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Trajectory forecasting is crucial for video surveillance analytics, as it enables the anticipation of future movements for a set of agents, e.g. basketball players engaged in intricate interactions with long-term intentions. Deep generative models offer a natural learning approach for trajectory forecasting, yet they encounter difficulties in achieving an optimal balance between sampling fidelity and diversity. We address this challenge by leveraging Vector Quantized Variational Autoencoders (VQ-VAEs), which utilize a discrete latent space to tackle the issue of posterior collapse. Specifically, we introduce an instance-based codebook that allows tailored latent representations for each example. In a nutshell, the rows of the codebook are dynamically adjusted to reflect contextual information (i.e., past motion patterns extracted from the observed trajectories). In this way, the discretization process gains flexibility, leading to improved reconstructions. Notably, instance-level dynamics are injected into the codebook through low-rank updates, which restrict the customization of the codebook to a lower dimension space. The resulting discrete space serves as the basis of the subsequent step, which regards the training of a diffusion-based predictive model. We show that such a two-fold framework, augmented with instance-level discretization, leads to accurate and diverse forecasts, yielding state-of-the-art performance on three established benchmarks.

[207]  arXiv:2405.20745 [pdf, other]
Title: Practical Modelling with Bigraphs
Comments: 34 pages
Subjects: Logic in Computer Science (cs.LO); Symbolic Computation (cs.SC); Software Engineering (cs.SE)

Bigraphs are a versatile modelling formalism that allows easy expression of placement and connectivity relations in a graphical format. System evolution is user defined as a set of rewrite rules. This paper presents a practical, yet detailed guide to developing, executing, and reasoning about bigraph models, including recent extensions such as parameterised, instantaneous, prioritised and conditional rules, and probabilistic and stochastic rewriting.

[208]  arXiv:2405.20748 [pdf, other]
Title: OpenTensor: Reproducing Faster Matrix Multiplication Discovering Algorithms
Authors: Yiwen Sun, Wenye Li
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

OpenTensor is a reproduction of AlphaTensor, which discovered a new algorithm that outperforms the state-of-the-art methods for matrix multiplication by Deep Reinforcement Learning (DRL). While AlphaTensor provides a promising framework for solving scientific problems, it is really hard to reproduce due to the massive tricks and lack of source codes. In this paper, we clean up the algorithm pipeline, clarify the technical details, and make some improvements to the training process. Computational results show that OpenTensor can successfully find efficient matrix multiplication algorithms.

[209]  arXiv:2405.20750 [pdf, other]
Title: Diffusion Models Are Innate One-Step Generators
Comments: 9 pages, 4 figures and 4 tables on the main contents
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion Models (DMs) have achieved great success in image generation and other fields. By fine sampling through the trajectory defined by the SDE/ODE solver based on a well-trained score model, DMs can generate remarkable high-quality results. However, this precise sampling often requires multiple steps and is computationally demanding. To address this problem, instance-based distillation methods have been proposed to distill a one-step generator from a DM by having a simpler student model mimic a more complex teacher model. Yet, our research reveals an inherent limitations in these methods: the teacher model, with more steps and more parameters, occupies different local minima compared to the student model, leading to suboptimal performance when the student model attempts to replicate the teacher. To avoid this problem, we introduce a novel distributional distillation method, which uses an exclusive distributional loss. This method exceeds state-of-the-art (SOTA) results while requiring significantly fewer training images. Additionally, we show that DMs' layers are activated differently at different time steps, leading to an inherent capability to generate images in a single step. Freezing most of the convolutional layers in a DM during distributional distillation leads to further performance improvements. Our method achieves the SOTA results on CIFAR-10 (FID 1.54), AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency. Most of those results are obtained with only 5 million training images within 6 hours on 8 A100 GPUs. This breakthrough not only enhances the understanding of efficient image generation models but also offers a scalable framework for advancing the state of the art in various applications.

[210]  arXiv:2405.20753 [pdf, ps, other]
Title: The generating power of weighted tree automata with initial algebra semantics
Comments: 20 pages, 2 figures
Subjects: Formal Languages and Automata Theory (cs.FL)

We consider the images of the initial algebra semantics of weighted tree automata over strong bimonoids (hence also over semirings). These images are subsets of the carrier set of the underlying strong bimonoid. We consider locally finite, weakly locally finite, and bi-locally finite strong bimonoids. We show that there exists a strong bimonoid which is weakly locally finite and not locally finite. We also show that if the ranked alphabet contains a binary symbol, then for any finitely generated strong bimonoid, weighted tree automata can generate, via their initial algebra semantics, all elements of the strong bimonoid. As a consequence of these results, for weakly locally finite strong bimonoids which are not locally finite, weighted tree automata can generate infinite images provided that the input ranked alphabet contains at least one binary symbol. This is in sharp contrast to the setting of weighted string automata, where each such image is known to be finite. As a further consequence, for any finitely generated semiring, there exists a weighted tree automaton which generates, via its run semantics, all elements of the semiring.

[211]  arXiv:2405.20755 [pdf, ps, other]
Title: Improving code-mixed hate detection by native sample mixing: A case study for Hindi-English code-mixed scenario
Comments: Generated from XeLaTeX
Subjects: Computation and Language (cs.CL)

Hate detection has long been a challenging task for the NLP community. The task becomes complex in a code-mixed environment because the models must understand the context and the hate expressed through language alteration. Compared to the monolingual setup, we see very less work on code-mixed hate as large-scale annotated hate corpora are unavailable to make the study. To overcome this bottleneck, we propose using native language hate samples. We hypothesise that in the era of multilingual language models (MLMs), hate in code-mixed settings can be detected by majorly relying on the native language samples. Even though the NLP literature reports the effectiveness of MLMs on hate detection in many cross-lingual settings, their extensive evaluation in a code-mixed scenario is yet to be done. This paper attempts to fill this gap through rigorous empirical experiments. We considered the Hindi-English code-mixed setup as a case study as we have the linguistic expertise for the same. Some of the interesting observations we got are: (i) adding native hate samples in the code-mixed training set, even in small quantity, improved the performance of MLMs for code-mixed hate detection, (ii) MLMs trained with native samples alone observed to be detecting code-mixed hate to a large extent, (iii) The visualisation of attention scores revealed that, when native samples were included in training, MLMs could better focus on the hate emitting words in the code-mixed context, and (iv) finally, when hate is subjective or sarcastic, naively mixing native samples doesn't help much to detect code-mixed hate. We will release the data and code repository to reproduce the reported results.

[212]  arXiv:2405.20759 [pdf, other]
Title: Information Theoretic Text-to-Image Alignment
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Diffusion models for Text-to-Image (T2I) conditional generation have seen tremendous success recently. Despite their success, accurately capturing user intentions with these models still requires a laborious trial and error process. This challenge is commonly identified as a model alignment problem, an issue that has attracted considerable attention by the research community. Instead of relying on fine-grained linguistic analyses of prompts, human annotation, or auxiliary vision-language models to steer image generation, in this work we present a novel method that relies on an information-theoretic alignment measure. In a nutshell, our method uses self-supervised fine-tuning and relies on point-wise mutual information between prompts and images to define a synthetic training set to induce model alignment. Our comparative analysis shows that our method is on-par or superior to the state-of-the-art, yet requires nothing but a pre-trained denoising network to estimate MI and a lightweight fine-tuning strategy.

[213]  arXiv:2405.20761 [pdf, other]
Title: Share Your Secrets for Privacy! Confidential Forecasting with Vertical Federated Learning
Comments: Submitted to the 27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2024)
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Vertical federated learning (VFL) is a promising area for time series forecasting in industrial applications, such as predictive maintenance and machine control. Critical challenges to address in manufacturing include data privacy and over-fitting on small and noisy datasets during both training and inference. Additionally, to increase industry adaptability, such forecasting models must scale well with the number of parties while ensuring strong convergence and low-tuning complexity. We address those challenges and propose 'Secret-shared Time Series Forecasting with VFL' (STV), a novel framework that exhibits the following key features: i) a privacy-preserving algorithm for forecasting with SARIMAX and autoregressive trees on vertically partitioned data; ii) serverless forecasting using secret sharing and multi-party computation; iii) novel N-party algorithms for matrix multiplication and inverse operations for direct parameter optimization, giving strong convergence with minimal hyperparameter tuning complexity. We conduct evaluations on six representative datasets from public and industry-specific contexts. Our results demonstrate that STV's forecasting accuracy is comparable to those of centralized approaches. They also show that our direct optimization can outperform centralized methods, which include state-of-the-art diffusion models and long-short-term memory, by 23.81% on forecasting accuracy. We also conduct a scalability analysis by examining the communication costs of direct and iterative optimization to navigate the choice between the two. Code and appendix are available: https://github.com/adis98/STV

[214]  arXiv:2405.20762 [pdf, ps, other]
Title: Comparison of Access Control Approaches for Graph-Structured Data
Comments: Extended version of an accepted paper at the 21st International Conference on Security and Cryptography (SECRYPT), 2024
Subjects: Cryptography and Security (cs.CR)

Access control is the enforcement of the authorization policy, which defines subjects, resources, and access rights. Graph-structured data requires advanced, flexible, and fine-grained access control due to its complex structure as sequences of alternating vertices and edges. Several research works focus on protecting property graph-structured data, enforcing fine-grained access control, and proving the feasibility and applicability of their concept. However, they differ conceptually and technically. We select works from our systematic literature review on authorization and access control for different database models in addition to recent ones. Based on defined criteria, we exclude research works with different objectives, such as no protection of graph-structured data, graph models other than the property graph, coarse-grained access control approaches, or no application in a graph datastore (i.e., no proof-of-concept implementation). The latest version of the remaining works are discussed in detail in terms of their access control approach as well as authorization policy definition and enforcement. Finally, we analyze the strengths and limitations of the selected works and provide a comparison with respect to different aspects, including the base access control model, open/closed policy, negative permission support, and datastore-independent enforcement.

[215]  arXiv:2405.20763 [pdf, other]
Title: Improving Generalization and Convergence by Enhancing Implicit Regularization
Comments: 35 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).

[216]  arXiv:2405.20764 [pdf, other]
Title: CoMoFusion: Fast and High-quality Fusion of Infrared and Visible Image with Consistency Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generative models are widely utilized to model the distribution of fused images in the field of infrared and visible image fusion. However, current generative models based fusion methods often suffer from unstable training and slow inference speed. To tackle this problem, a novel fusion method based on consistency model is proposed, termed as CoMoFusion, which can generate the high-quality images and achieve fast image inference speed. In specific, the consistency model is used to construct multi-modal joint features in the latent space with the forward and reverse process. Then, the infrared and visible features extracted by the trained consistency model are fed into fusion module to generate the final fused image. In order to enhance the texture and salient information of fused images, a novel loss based on pixel value selection is also designed. Extensive experiments on public datasets illustrate that our method obtains the SOTA fusion performance compared with the existing fusion methods.

[217]  arXiv:2405.20768 [pdf, other]
Title: Expanded Gating Ranges Improve Activation Functions
Authors: Allen Hao Huang
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Activation functions are core components of all deep learning architectures. Currently, the most popular activation functions are smooth ReLU variants like GELU and SiLU. These are self-gated activation functions where the range of the gating function is between zero and one. In this paper, we explore the viability of using arctan as a gating mechanism. A self-gated activation function that uses arctan as its gating function has a monotonically increasing first derivative. To make this activation function competitive, it is necessary to introduce a trainable parameter for every MLP block to expand the range of the gating function beyond zero and one. We find that this technique also improves existing self-gated activation functions. We conduct an empirical evaluation of Expanded ArcTan Linear Unit (xATLU), Expanded GELU (xGELU), and Expanded SiLU (xSiLU) and show that they outperform existing activation functions within a transformer architecture. Additionally, expanded gating ranges show promising results in improving first-order Gated Linear Units (GLU).

[218]  arXiv:2405.20769 [pdf, other]
Title: Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition
Subjects: Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied.
Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $\varepsilon \approx 1$ for Poisson subsampling and $\varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.

[219]  arXiv:2405.20770 [pdf, other]
Title: Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent
Authors: Guang Lin, Qibin Zhao
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Over the past two years, the use of large language models (LLMs) has advanced rapidly. While these LLMs offer considerable convenience, they also raise security concerns, as LLMs are vulnerable to adversarial attacks by some well-designed textual perturbations. In this paper, we introduce a novel defense technique named Large LAnguage MOdel Sentinel (LLAMOS), which is designed to enhance the adversarial robustness of LLMs by purifying the adversarial textual examples before feeding them into the target LLM. Our method comprises two main components: a) Agent instruction, which can simulate a new agent for adversarial defense, altering minimal characters to maintain the original meaning of the sentence while defending against attacks; b) Defense guidance, which provides strategies for modifying clean or adversarial examples to ensure effective defense and accurate outputs from the target LLMs. Remarkably, the defense agent demonstrates robust defensive capabilities even without learning from adversarial examples. Additionally, we conduct an intriguing adversarial experiment where we develop two agents, one for defense and one for defense, and engage them in mutual confrontation. During the adversarial interactions, neither agent completely beat the other. Extensive experiments on both open-source and closed-source LLMs demonstrate that our method effectively defends against adversarial attacks, thereby enhancing adversarial robustness.

[220]  arXiv:2405.20771 [pdf, other]
Title: Towards Black-Box Membership Inference Attack for Diffusion Models
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Identifying whether an artwork was used to train a diffusion model is an important research topic, given the rising popularity of AI-generated art and the associated copyright concerns. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitations of applying existing MIA methods for copyright protection: the required access of internal U-nets and the choice of non-member datasets for evaluation. To address the above problems, we introduce a novel black-box membership inference attack method that operates without needing access to the model's internal U-net. We then construct a DALL-E generated dataset for a more comprehensive evaluation. We validate our method across various setups, and our experimental results outperform previous works.

[221]  arXiv:2405.20772 [pdf, ps, other]
Title: Reinforcement Learning for Sociohydrology
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

In this study, we discuss how reinforcement learning (RL) provides an effective and efficient framework for solving sociohydrology problems. The efficacy of RL for these types of problems is evident because of its ability to update policies in an iterative manner - something that is also foundational to sociohydrology, where we are interested in representing the co-evolution of human-water interactions. We present a simple case study to demonstrate the implementation of RL in a problem of runoff reduction through management decisions related to changes in land-use land-cover (LULC). We then discuss the benefits of RL for these types of problems and share our perspectives on the future research directions in this area.

[222]  arXiv:2405.20773 [pdf, other]
Title: Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Characte
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), ensuring their safety has become increasingly critical. To achieve this objective, it requires us to proactively discover the vulnerability of MLLMs by exploring the attack methods. Thus, structure-based jailbreak attacks, where harmful semantic content is embedded within images, have been proposed to mislead the models. However, previous structure-based jailbreak methods mainly focus on transforming the format of malicious queries, such as converting harmful content into images through typography, which lacks sufficient jailbreak effectiveness and generalizability. To address these limitations, we first introduce the concept of "Role-play" into MLLM jailbreak attacks and propose a novel and effective method called Visual Role-play (VRP). Specifically, VRP leverages Large Language Models to generate detailed descriptions of high-risk characters and create corresponding images based on the descriptions. When paired with benign role-play instruction texts, these high-risk character images effectively mislead MLLMs into generating malicious responses by enacting characters with negative attributes. We further extend our VRP method into a universal setup to demonstrate its generalizability. Extensive experiments on popular benchmarks show that VRP outperforms the strongest baseline, Query relevant and FigStep, by an average Attack Success Rate (ASR) margin of 14.3% across all models.

[223]  arXiv:2405.20774 [pdf, other]
Title: Exploring Backdoor Attacks against Large Language Model-based Decision Making
Comments: 27 pages, including main paper, references, and appendix
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have shown significant promise in decision-making tasks when fine-tuned on specific applications, leveraging their inherent common sense and reasoning abilities learned from vast amounts of data. However, these systems are exposed to substantial safety and security risks during the fine-tuning phase. In this work, we propose the first comprehensive framework for Backdoor Attacks against LLM-enabled Decision-making systems (BALD), systematically exploring how such attacks can be introduced during the fine-tuning phase across various channels. Specifically, we propose three attack mechanisms and corresponding backdoor optimization methods to attack different components in the LLM-based decision-making pipeline: word injection, scenario manipulation, and knowledge injection. Word injection embeds trigger words directly into the query prompt. Scenario manipulation occurs in the physical environment, where a high-level backdoor semantic scenario triggers the attack. Knowledge injection conducts backdoor attacks on retrieval augmented generation (RAG)-based LLM systems, strategically injecting word triggers into poisoned knowledge while ensuring the information remains factually accurate for stealthiness. We conduct extensive experiments with three popular LLMs (GPT-3.5, LLaMA2, PaLM2), using two datasets (HighwayEnv, nuScenes), and demonstrate the effectiveness and stealthiness of our backdoor triggers and mechanisms. Finally, we critically assess the strengths and weaknesses of our proposed approaches, highlight the inherent vulnerabilities of LLMs in decision-making tasks, and evaluate potential defenses to safeguard LLM-based decision making systems.

[224]  arXiv:2405.20775 [pdf, other]
Title: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)

Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevance of question-and-answer interactions are critically tested against complex medical challenges. By combining existing clinical medical data with atypical natural phenomena, we redefine two types of attacks: mismatched malicious attack (2M-attack) and optimized mismatched malicious attack (O2M-attack). Using our own constructed voluminous 3MAD dataset, which covers a wide range of medical image modalities and harmful medical scenarios, we conduct a comprehensive analysis and propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs. Evaluations with this dataset and novel attack methods, including white-box attacks on LLaVA-Med and transfer attacks on four other state-of-the-art models, indicate that even MedMLLMs designed with enhanced security features are vulnerable to security breaches. Our work underscores the urgent need for a concerted effort to implement robust security measures and enhance the safety and efficacy of open-source MedMLLMs, particularly given the potential severity of jailbreak attacks and other malicious or clinically significant exploits in medical settings. For further research and replication, anonymous access to our code is available at https://github.com/dirtycomputer/O2M_attack. Warning: Medical large model jailbreaking may generate content that includes unverified diagnoses and treatment recommendations. Always consult professional medical advice.

[225]  arXiv:2405.20776 [pdf, other]
Title: Federated Learning with Blockchain-Enhanced Machine Unlearning: A Trustworthy Approach
Comments: 13 pages, 25 figures
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

With the growing need to comply with privacy regulations and respond to user data deletion requests, integrating machine unlearning into IoT-based federated learning has become imperative. Traditional unlearning methods, however, often lack verifiable mechanisms, leading to challenges in establishing trust. This paper delves into the innovative integration of blockchain technology with federated learning to surmount these obstacles. Blockchain fortifies the unlearning process through its inherent qualities of immutability, transparency, and robust security. It facilitates verifiable certification, harmonizes security with privacy, and sustains system efficiency. We introduce a framework that melds blockchain with federated learning, thereby ensuring an immutable record of unlearning requests and actions. This strategy not only bolsters the trustworthiness and integrity of the federated learning model but also adeptly addresses efficiency and security challenges typical in IoT environments. Our key contributions encompass a certification mechanism for the unlearning process, the enhancement of data security and privacy, and the optimization of data management to ensure system responsiveness in IoT scenarios.

[226]  arXiv:2405.20777 [pdf, other]
Title: Black-Box Detection of Language Model Watermarks
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Watermarking has emerged as a promising way to detect LLM-generated text. To apply a watermark an LLM provider, given a secret key, augments generations with a signal that is later detectable by any party with the same key. Recent work has proposed three main families of watermarking schemes, two of which focus on the property of preserving the LLM distribution. This is motivated by it being a tractable proxy for maintaining LLM capabilities, but also by the idea that concealing a watermark deployment makes it harder for malicious actors to hide misuse by avoiding a certain LLM or attacking its watermark. Yet, despite much discourse around detectability, no prior work has investigated if any of these scheme families are detectable in a realistic black-box setting. We tackle this for the first time, developing rigorous statistical tests to detect the presence of all three most popular watermarking scheme families using only a limited number of black-box queries. We experimentally confirm the effectiveness of our methods on a range of schemes and a diverse set of open-source models. Our findings indicate that current watermarking schemes are more detectable than previously believed, and that obscuring the fact that a watermark was deployed may not be a viable way for providers to protect against adversaries. We further apply our methods to test for watermark presence behind the most popular public APIs: GPT4, Claude 3, Gemini 1.0 Pro, finding no strong evidence of a watermark at this point in time.

[227]  arXiv:2405.20778 [pdf, other]
Title: Improved Generation of Adversarial Examples Against Safety-aligned LLMs
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Despite numerous efforts to ensure large language models (LLMs) adhere to safety standards and produce harmless content, some successes have been achieved in bypassing these restrictions, known as jailbreak attacks against LLMs. Adversarial prompts generated using gradient-based methods exhibit outstanding performance in performing jailbreak attacks automatically. Nevertheless, due to the discrete nature of texts, the input gradient of LLMs struggles to precisely reflect the magnitude of loss change that results from token replacements in the prompt, leading to limited attack success rates against safety-aligned LLMs, even in the white-box setting. In this paper, we explore a new perspective on this problem, suggesting that it can be alleviated by leveraging innovations inspired in transfer-based attacks that were originally proposed for attacking black-box image classification models. For the first time, we appropriate the ideologies of effective methods among these transfer-based attacks, i.e., Skip Gradient Method and Intermediate Level Attack, for improving the effectiveness of automatically generated adversarial examples against white-box LLMs. With appropriate adaptations, we inject these ideologies into gradient-based adversarial prompt generation processes and achieve significant performance gains without introducing obvious computational cost. Meanwhile, by discussing mechanisms behind the gains, new insights are drawn, and proper combinations of these methods are also developed. Our empirical results show that the developed combination achieves >30% absolute increase in attack success rates compared with GCG for attacking the Llama-2-7B-Chat model on AdvBench.

[228]  arXiv:2405.20779 [pdf, ps, other]
Title: Asymptotic utility of spectral anonymization
Comments: 16 pages, 6 figures
Subjects: Cryptography and Security (cs.CR); Methodology (stat.ME)

In the contemporary data landscape characterized by multi-source data collection and third-party sharing, ensuring individual privacy stands as a critical concern. While various anonymization methods exist, their utility preservation and privacy guarantees remain challenging to quantify. In this work, we address this gap by studying the utility and privacy of the spectral anonymization (SA) algorithm, particularly in an asymptotic framework. Unlike conventional anonymization methods that directly modify the original data, SA operates by perturbing the data in a spectral basis and subsequently reverting them to their original basis. Alongside the original version $\mathcal{P}$-SA, employing random permutation transformation, we introduce two novel SA variants: $\mathcal{J}$-spectral anonymization and $\mathcal{O}$-spectral anonymization, which employ sign-change and orthogonal matrix transformations, respectively. We show how well, under some practical assumptions, these SA algorithms preserve the first and second moments of the original data. Our results reveal, in particular, that the asymptotic efficiency of all three SA algorithms in covariance estimation is exactly 50% when compared to the original data. To assess the applicability of these asymptotic results in practice, we conduct a simulation study with finite data and also evaluate the privacy protection offered by these algorithms using distance-based record linkage. Our research reveals that while no method exhibits clear superiority in finite-sample utility, $\mathcal{O}$-SA distinguishes itself for its exceptional privacy preservation, never producing identical records, albeit with increased computational complexity. Conversely, $\mathcal{P}$-SA emerges as a computationally efficient alternative, demonstrating unmatched efficiency in mean estimation.

[229]  arXiv:2405.20782 [pdf, other]
Title: Universal Exact Compression of Differentially Private Mechanisms
Comments: 30 pages, 3 figures
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (stat.ML)

To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the original local randomizer. Hence, the PPR-compressed privacy mechanism retains all desirable statistical properties of the original privacy mechanism such as unbiasedness and Gaussianity. Moreover, PPR achieves a compression size within a logarithmic gap from the theoretical lower bound. Using the PPR, we give a new order-wise trade-off between communication, accuracy, central and local differential privacy for distributed mean estimation. Experiment results on distributed mean estimation show that PPR consistently gives a better trade-off between communication, accuracy and central differential privacy compared to the coordinate subsampled Gaussian mechanism, while also providing local differential privacy.

[230]  arXiv:2405.20785 [pdf, other]
Title: How the Future Works at SOUPS: Analyzing Future Work Statements and Their Impact on Usable Security and Privacy Research
Authors: Jacques Suray (1), Jan H. Klemmer (2), Juliane Schmüser (2), Sascha Fahl (2) ((1) Leibniz University Hannover, (2) CISPA Helmholtz Center for Information Security)
Comments: 16 pages, 4 figures, 2 tables
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Extending knowledge by identifying and investigating valuable research questions and problems is a core function of research. Research publications often suggest avenues for future work to extend and build upon their results. Considering these suggestions can contribute to developing research ideas that build upon previous work and produce results that tie into existing knowledge. Usable security and privacy researchers commonly add future work statements to their publications. However, our community lacks an in-depth understanding of their prevalence, quality, and impact on future research.
Our work aims to address this gap in the research literature. We reviewed all 27 papers from the 2019 SOUPS proceedings and analyzed their future work statements. Additionally, we analyzed 978 publications that cite any paper from SOUPS 2019 proceedings to assess their future work statements' impact. We find that most papers from the SOUPS 2019 proceedings include future work statements. However, they are often unspecific or ambiguous, and not always easy to find. Therefore, the citing publications often matched the future work statements' content thematically, but rarely explicitly acknowledged them, indicating a limited impact. We conclude with recommendations for the usable security and privacy community to improve the utility of future work statements by making them more tangible and actionable, and avenues for future work.

[231]  arXiv:2405.20786 [pdf, other]
Title: Stratified Avatar Generation from Sparse Observations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this paper, we are inspired by the inherent property of the kinematic tree defined in the Skinned Multi-Person Linear (SMPL) model, where the upper body and lower body share only one common ancestor node, bringing the potential of decoupled reconstruction. We propose a stratified approach to decouple the conventional full-body avatar reconstruction pipeline into two stages, with the reconstruction of the upper body first and a subsequent reconstruction of the lower body conditioned on the previous stage. To implement this straightforward idea, we leverage the latent diffusion model as a powerful probabilistic generator, and train it to follow the latent distribution of decoupled motions explored by a VQ-VAE encoder-decoder model. Extensive experiments on AMASS mocap dataset demonstrate our state-of-the-art performance in the reconstruction of full-body motions.

[232]  arXiv:2405.20787 [pdf, other]
Title: PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction
Subjects: Computation and Language (cs.CL)

Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-samples with the same sentence meaning but with different representations and forms by paraphrasing the original training set samples. As well as instructing LLM to generate sentences that implicitly contain information about the corresponding labels based on the relation and entity of the original training set samples. These two kinds of pseudo-samples participate in the training of the RE model together with the original dataset, respectively. The PGA framework in the experiment improves the F1 scores of the three mainstream models for RE within the scientific domain. Also, using a LLM to obtain samples can effectively reduce the cost of manually labeling data.

[233]  arXiv:2405.20790 [pdf, other]
Title: Intersectional Unfairness Discovery
Comments: ICML-2024 Camera-ready
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

AI systems have been shown to produce unfair results for certain subgroups of population, highlighting the need to understand bias on certain sensitive attributes. Current research often falls short, primarily focusing on the subgroups characterized by a single sensitive attribute, while neglecting the nature of intersectional fairness of multiple sensitive attributes. This paper focuses on its one fundamental aspect by discovering diverse high-bias subgroups under intersectional sensitive attributes. Specifically, we propose a Bias-Guided Generative Network (BGGN). By treating each bias value as a reward, BGGN efficiently generates high-bias intersectional sensitive attributes. Experiments on real-world text and image datasets demonstrate a diverse and efficient discovery of BGGN. To further evaluate the generated unseen but possible unfair intersectional sensitive attributes, we formulate them as prompts and use modern generative AI to produce new texts and images. The results of frequently generating biased data provides new insights of discovering potential unfairness in popular modern generative AI systems. Warning: This paper contains generative examples that are offensive in nature.

[234]  arXiv:2405.20791 [pdf, other]
Title: GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Decoupling the illumination in 3D scenes is crucial for novel view synthesis and relighting. In this paper, we propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points. Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components, enabling the synthesis of realistic lighting effects. To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework. The fundamental idea is to view the rendering tasks under various lighting positions as a multi-task learning problem, which our meta-learning approach effectively addresses by generalizing the learned Gaussian geometries not only across different viewpoints but also across diverse light positions. Experimental results demonstrate the effectiveness of our approach in terms of training efficiency and rendering quality compared to existing methods for free-viewpoint relighting.

[235]  arXiv:2405.20794 [pdf, ps, other]
Title: Model Interpretation and Explainability: Towards Creating Transparency in Prediction Models
Subjects: Machine Learning (cs.LG)

Explainable AI (XAI) has a counterpart in analytical modeling which we refer to as model explainability. We tackle the issue of model explainability in the context of prediction models. We analyze a dataset of loans from a credit card company and apply three stages: execute and compare four different prediction methods, apply the best known explainability techniques in the current literature to the model training sets to identify feature importance (FI) (static case), and finally to cross-check whether the FI set holds up under what if prediction scenarios for continuous and categorical variables (dynamic case). We found inconsistency in FI identification between the static and dynamic cases. We summarize the state of the art in model explainability and suggest further research to advance the field.

[236]  arXiv:2405.20795 [pdf, other]
Title: InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Accurate visual understanding is imperative for advancing autonomous systems and intelligent robots. Despite the powerful capabilities of vision-language models (VLMs) in processing complex visual scenes, precisely recognizing obscured or ambiguously presented visual elements remains challenging. To tackle such issues, this paper proposes InsightSee, a multi-agent framework to enhance VLMs' interpretative capabilities in handling complex visual understanding scenarios. The framework comprises a description agent, two reasoning agents, and a decision agent, which are integrated to refine the process of visual information interpretation. The design of these agents and the mechanisms by which they can be enhanced in visual information processing are presented. Experimental results demonstrate that the InsightSee framework not only boosts performance on specific visual tasks but also retains the original models' strength. The proposed framework outperforms state-of-the-art algorithms in 6 out of 9 benchmark tests, with a substantial advancement in multimodal understanding.

[237]  arXiv:2405.20797 [pdf, other]
Title: Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly by the vision encoder -- makes challenges for a more seamless fusion of visual and textual information. We propose Ovis, a novel MLLM architecture designed to structurally align visual and textual embeddings. Ovis integrates an additional learnable visual embedding table into the visual encoder's process. To capture rich visual semantics, each image patch indexes the visual embedding table multiple times, resulting in a final visual embedding that is a probabilistic combination of the indexed embeddings. This structural approach mirrors the method used for generating textual embeddings. Empirical evaluations on various multimodal benchmarks demonstrate that Ovis outperforms open-source MLLMs of similar parameter scales and even surpasses the proprietary model Qwen-VL-Plus overall. These results highlight the potential of Ovis' structured visual representation for advancing MLLM architectural design and promoting more effective multimodal learning. Both the source code and the training dataset of Ovis will be made publicly available.

[238]  arXiv:2405.20800 [pdf, other]
Title: Shape Constraints in Symbolic Regression using Penalized Least Squares
Subjects: Machine Learning (cs.LG); Symbolic Computation (cs.SC)

We study the addition of shape constraints and their consideration during the parameter estimation step of symbolic regression (SR). Shape constraints serve as a means to introduce prior knowledge about the shape of the otherwise unknown model function into SR. Unlike previous works that have explored shape constraints in SR, we propose minimizing shape constraint violations during parameter estimation using gradient-based numerical optimization.
We test three algorithm variants to evaluate their performance in identifying three symbolic expressions from a synthetically generated data set. This paper examines two benchmark scenarios: one with varying noise levels and another with reduced amounts of training data. The results indicate that incorporating shape constraints into the expression search is particularly beneficial when data is scarce. Compared to using shape constraints only in the selection process, our approach of minimizing violations during parameter estimation shows a statistically significant benefit in some of our test cases, without being significantly worse in any instance.

[239]  arXiv:2405.20804 [pdf, ps, other]
Title: Reachability and Safety Games under TSO Semantics (Extended Version)
Authors: Stephan Spengler
Comments: 22 pages, 8 figures, accepted and to be presented at GandALF 2024
Subjects: Computer Science and Game Theory (cs.GT); Logic in Computer Science (cs.LO)

We consider games played on the transtion graph of concurrent programs running under the TotalStore Order (TSO) weak memory model. Games are frequently used to model the interaction between a system and its environment, in this case between the concurrent processes and the nondeterminisitic TSO buffer updates. The game is played by two players, who alternatinglymake a move: Theprocess playercan execute any enabled instruction of the processes, while theupdate playertakes care of updating the messages in the buffers that are between each process andthe shared memory. We show that the reachability and safety problem of this game reduce to theanalysis of single-process (non-concurrent) programs. In particular, they exhibit only finite-statebehaviour. Because of this, we introduce different notions offairness, which force the two players tobehave in a more realistic way. Both the reachability and safety problem then become undecidable.

[240]  arXiv:2405.20805 [pdf, ps, other]
Title: Multilingual Text Style Transfer: Datasets & Models for Indian Languages
Subjects: Computation and Language (cs.CL)

Text style transfer (TST) involves altering the linguistic style of a text while preserving its core content. This paper focuses on sentiment transfer, a vital TST subtask (Mukherjee et al., 2022a), across a spectrum of Indian languages: Hindi, Magahi, Malayalam, Marathi, Punjabi, Odia, Telugu, and Urdu, expanding upon previous work on English-Bangla sentiment transfer (Mukherjee et al., 2023). We introduce dedicated datasets of 1,000 positive and 1,000 negative style-parallel sentences for each of these eight languages. We then evaluate the performance of various benchmark models categorized into parallel, non-parallel, cross-lingual, and shared learning approaches, including the Llama2 and GPT-3.5 large language models (LLMs). Our experiments highlight the significance of parallel data in TST and demonstrate the effectiveness of the Masked Style Filling (MSF) approach (Mukherjee et al., 2023) in non-parallel techniques. Moreover, cross-lingual and joint multilingual learning methods show promise, offering insights into selecting optimal models tailored to the specific language and task requirements. To the best of our knowledge, this work represents the first comprehensive exploration of the TST task as sentiment transfer across a diverse set of languages.

[241]  arXiv:2405.20806 [pdf, other]
Title: There and Back Again: The AI Alignment Paradox
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This perspective article draws attention to a fundamental challenge inherent in all AI alignment endeavors, which we term the "AI alignment paradox": The better we align AI models with our values, the easier we make it for adversaries to misalign the models. We illustrate the paradox by sketching three concrete example incarnations for the case of language models, each corresponding to a distinct way in which adversaries can exploit the paradox. With AI's increasing real-world impact, it is imperative that a broad community of researchers be aware of the AI alignment paradox and work to find ways to break out of it, in order to ensure the beneficial use of AI for the good of humanity.

[242]  arXiv:2405.20808 [pdf, other]
Title: Optimally Improving Cooperative Learning in a Social Setting
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

We consider a cooperative learning scenario where a collection of networked agents with individually owned classifiers dynamically update their predictions, for the same classification task, through communication or observations of each other's predictions. Clearly if highly influential vertices use erroneous classifiers, there will be a negative effect on the accuracy of all the agents in the network. We ask the following question: how can we optimally fix the prediction of a few classifiers so as maximize the overall accuracy in the entire network. To this end we consider an aggregate and an egalitarian objective function. We show a polynomial time algorithm for optimizing the aggregate objective function, and show that optimizing the egalitarian objective function is NP-hard. Furthermore, we develop approximation algorithms for the egalitarian improvement. The performance of all of our algorithms are guaranteed by mathematical analysis and backed by experiments on synthetic and real data.

[243]  arXiv:2405.20810 [pdf, other]
Title: Context-aware Difference Distilling for Multi-change Captioning
Comments: Accepted by ACL 2024 main conference (long paper)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. Compared with single-change captioning, this task requires the model to have higher-level cognition ability to reason an arbitrary number of changes. In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences. Given an image pair, CARD first decouples context features that aggregate all similar/dissimilar semantics, termed common/difference context features. Then, the consistency and independence constraints are designed to guarantee the alignment/discrepancy of common/difference context features. Further, the common context features guide the model to mine locally unchanged features, which are subtracted from the pair to distill locally difference features. Next, the difference context features augment the locally difference features to ensure that all changes are distilled. In this way, we obtain an omni-representation of all changes, which is translated into linguistic sentences by a transformer decoder. Extensive experiments on three public datasets show CARD performs favourably against state-of-the-art methods.The code is available at https://github.com/tuyunbin/CARD.

[244]  arXiv:2405.20815 [pdf, other]
Title: Distributed Simulation for Digital Twins of Large-Scale Real-World DiffServ-Based Networks
Comments: 15 pages, 6 figures, accepted by Euro-Par 2024: 30th International European Conference on Parallel and Distributed Computing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

Digital Twin technology facilitates the monitoring and online analysis of large-scale communication networks. Faster predictions of network performance thus become imperative, especially for analysing Quality of Service (QoS) parameters in large-scale city networks. Discrete Event Simulation (DES) is a standard network analysis technology, and can be further optimised with parallel and distributed execution for speedup, referred to as Parallel Discrete Event Simulation (PDES). However, modelling detailed QoS mechanisms such as DiffServ requires complex event handling for each network router, which can involve excessive simulation events. In addition, current PDES for network analysis mostly adopts conservative scheduling, which suffers from excessive global synchronisation to avoid causality problems. The performance analysis of optimistic PDES for real-world large-scale network topology and complex QoS mechanisms is still inadequate. To address these gaps, this paper proposes a simulation toolkit, Quaint, which leverages an optimistic PDES engine ROSS, for detailed modelling of DiffServ-based networks. A novel event-handling model for each network router is also proposed to significantly reduce the number of events in complex QoS modelling. Quaint has been evaluated using a real-world metropolitan-scale network topology with 5,000 routers/switches. Results show that compared to the conventional simulator OMNeT++/INET, even the sequential mode of Quaint can achieve a speedup of 53 times, and the distributed mode has a speedup of 232 times. Scalability characterisation is conducted to portray the efficiency of distributed execution, and the results indicate the future direction for workload-aware model partitioning.

[245]  arXiv:2405.20818 [pdf, other]
Title: An iterated learning model of language change that mixes supervised and unsupervised learning
Subjects: Computation and Language (cs.CL); Adaptation and Self-Organizing Systems (nlin.AO); Populations and Evolution (q-bio.PE)

The iterated learning model is an agent-based model of language change in which language is transmitted from a tutor to a pupil which itself becomes a tutor to a new pupil, and so on. Languages that are stable, expressive, and compositional arise spontaneously as a consequence of a language transmission bottleneck. Previous models have implemented an agent's mapping from signals to meanings using an artificial neural network decoder, but have relied on an unrealistic and computationally expensive process of obversion to implement the associated encoder, mapping from meanings to signals. Here, a new model is presented in which both decoder and encoder are neural networks, trained separately through supervised learning, and trained together through unsupervised learning in the form of an autoencoder. This avoids the substantial computational burden entailed in obversion and introduces a mixture of supervised and unsupervised learning as observed during human development.

[246]  arXiv:2405.20819 [pdf, ps, other]
Title: Heuristic evaluations of back support, shoulder support, hand grip strength support, and sit-stand support exoskeletons using universal design principles
Subjects: Human-Computer Interaction (cs.HC)

Occupational exoskeletons promise to reduce the incidence of musculoskeletal injuries; however, we do not know if their designs allow universal use by all workers. We also do not know how easy the tasks of assembling, donning, doffing, and disassembling exoskeletons are. The purpose of our study was to heuristically evaluate a back support, a shoulder support, a handgrip strength support, and a sit-stand exoskeleton for how well they are designed for universal use when assembling, donning, doffing, and disassembling the exoskeleton. Seven evaluators used universal design principles and associated criteria to independently evaluate and rate four exoskeletons when assembling, donning, doffing, and disassembling the devices. The rating scale was a Likert-type scale, where a rating of 1 represented not at all, and a rating of 5 represented an excellent design with respect to the universal design criteria for the task. The results indicate that providing perceptible information to the user, making the design equitable to use for a diverse set of users, making the design simple and intuitive to use with adequate feedback, and designing to prevent user errors, and when errors are made, allowing the user to recover quickly from the errors, were rated poorly. Assembling and donning tasks presented the most challenges.

[247]  arXiv:2405.20820 [pdf, ps, other]
Title: Constrained Dynamics Simulation: More With Less
Comments: Accepted submission to RSS:24 Pioneers Workshop
Subjects: Robotics (cs.RO)

Efficient robot dynamics simulation is a fundamental problem key for robot control, identification, design and analysis. This research statement explores my current progress in this field and future research directions.

[248]  arXiv:2405.20821 [pdf, other]
Title: Pursuing Overall Welfare in Federated Learning through Sequential Decision Making
Comments: Accepted at ICML 2024
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

In traditional federated learning, a single global model cannot perform equally well for all clients. Therefore, the need to achieve the client-level fairness in federated system has been emphasized, which can be realized by modifying the static aggregation scheme for updating the global model to an adaptive one, in response to the local signals of the participating clients. Our work reveals that existing fairness-aware aggregation strategies can be unified into an online convex optimization framework, in other words, a central server's sequential decision making process. To enhance the decision making capability, we propose simple and intuitive improvements for suboptimal designs within existing methods, presenting AAggFF. Considering practical requirements, we further subdivide our method tailored for the cross-device and the cross-silo settings, respectively. Theoretical analyses guarantee sublinear regret upper bounds for both settings: $\mathcal{O}(\sqrt{T \log{K}})$ for the cross-device setting, and $\mathcal{O}(K \log{T})$ for the cross-silo setting, with $K$ clients and $T$ federation rounds. Extensive experiments demonstrate that the federated system equipped with AAggFF achieves better degree of client-level fairness than existing methods in both practical settings. Code is available at https://github.com/vaseline555/AAggFF

[249]  arXiv:2405.20824 [pdf, ps, other]
Title: Online Convex Optimisation: The Optimal Switching Regret for all Segmentations Simultaneously
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider the classic problem of online convex optimisation. Whereas the notion of static regret is relevant for stationary problems, the notion of switching regret is more appropriate for non-stationary problems. A switching regret is defined relative to any segmentation of the trial sequence, and is equal to the sum of the static regrets of each segment. In this paper we show that, perhaps surprisingly, we can achieve the asymptotically optimal switching regret on every possible segmentation simultaneously. Our algorithm for doing so is very efficient: having a space and per-trial time complexity that is logarithmic in the time-horizon. Our algorithm also obtains novel bounds on its dynamic regret: being adaptive to variations in the rate of change of the comparator sequence.

[250]  arXiv:2405.20829 [pdf, other]
Title: Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference
Comments: CVPR Workshop on Computer Vision in the Wild (CVinW), 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Open-world semi-supervised learning (OWSSL) extends conventional semi-supervised learning to open-world scenarios by taking account of novel categories in unlabeled datasets. Despite the recent advancements in OWSSL, the success often relies on the assumptions that 1) labeled and unlabeled datasets share the same balanced class prior distribution, which does not generally hold in real-world applications, and 2) unlabeled training datasets are utilized for evaluation, where such transductive inference might not adequately address challenges in the wild. In this paper, we aim to generalize OWSSL by addressing them. Our work suggests that practical OWSSL may require different training settings, evaluation methods, and learning strategies compared to those prevalent in the existing literature.

[251]  arXiv:2405.20830 [pdf, other]
Title: Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Traditional language model alignment methods, such as Direct Preference Optimization (DPO), are limited by their dependence on static, pre-collected paired preference data, which hampers their adaptability and practical applicability. To overcome this limitation, we introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation. Specifically, we employ an Exponential Moving Average (EMA) model in conjunction with a replay buffer to enable dynamic updates of response segments, effectively integrating real-time feedback with insights from historical data. Our comprehensive evaluations of the LLaMA3-8B and Mistral-7B models across benchmarks, including the Open LLM Leaderboard, IFEval, AlpacaEval 2.0, and MT-Bench, demonstrate that SAPO matches or surpasses established offline contrastive baselines, such as DPO and Odds Ratio Preference Optimization, and outperforms offline self-play methods like SPIN. Our code is available at https://github.com/yinyueqin/SAPO

[252]  arXiv:2405.20833 [pdf, other]
Title: That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses
Authors: Ella Rabinovich
Comments: ACL2024 (main conference), 8 pages
Subjects: Computation and Language (cs.CL)

The Uniform Information Density (UID) hypothesis posits that speakers optimize the communicative properties of their utterances by avoiding spikes in information, thereby maintaining a relatively uniform information profile over time. This paper investigates the impact of UID principles on syntactic reduction, specifically focusing on the optional omission of the connector "that" in English subordinate clauses. Building upon previous research, we extend our investigation to a larger corpus of written English, utilize contemporary large language models (LLMs) and extend the information-uniformity principles by the notion of entropy, to estimate the UID manifestations in the usecase of syntactic reduction choices.

[253]  arXiv:2405.20834 [pdf, other]
Title: Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been extensively explored, its adaptation into multimodal vision-language models remains nascent. Going beyond mere answer generation, the primary goal of multimodal RAG is to cultivate the models' ability to reason in response to relevant queries. To this end, we introduce a novel multimodal RAG framework named RMR (Retrieval Meets Reasoning). The RMR framework employs a bi-modal retrieval module to identify the most relevant question-answer pairs, which then serve as scaffolds for the multimodal reasoning process. This training-free approach not only encourages the model to engage deeply with the reasoning processes inherent in the retrieved content but also facilitates the generation of answers that are precise and richly interpretable. Surprisingly, utilizing solely the ScienceQA dataset, collected from elementary and high school science curricula, RMR significantly boosts the performance of various vision-language models across a spectrum of benchmark datasets, including A-OKVQA, MMBench, and SEED. These outcomes highlight the substantial potential of our multimodal retrieval and reasoning mechanism to improve the reasoning capabilities of vision-language models.

[254]  arXiv:2405.20835 [pdf, other]
Title: Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Post-Training Quantization (PTQ) enhances the efficiency of Large Language Models (LLMs) by enabling faster operation and compatibility with more accessible hardware through reduced memory usage, at the cost of small performance drops. We explore the role of calibration sets in PTQ, specifically their effect on hidden activations in various notable open-source LLMs. Calibration sets are crucial for evaluating activation magnitudes and identifying outliers, which can distort the quantization range and negatively impact performance. Our analysis reveals a marked contrast in quantization effectiveness across models. The older OPT model, which much of the quantization literature is based on, shows significant performance deterioration and high susceptibility to outliers with varying calibration sets. In contrast, newer models like Llama-2 7B, Llama-3 8B, Command-R 35B, and Mistral 7B demonstrate strong robustness, with Mistral 7B showing near-immunity to outliers and stable activations. These findings suggest a shift in PTQ strategies might be needed. As advancements in pre-training methods reduce the relevance of outliers, there is an emerging need to reassess the fundamentals of current quantization literature. The emphasis should pivot towards optimizing inference speed, rather than primarily focusing on outlier preservation, to align with the evolving characteristics of state-of-the-art LLMs.

[255]  arXiv:2405.20836 [pdf, other]
Title: Solving partial differential equations with sampled neural networks
Comments: 16 pages, 15 figures
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)

Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Using neural networks as an ansatz for the solution has proven a challenge in terms of training time and approximation accuracy. In this contribution, we discuss how sampling the hidden weights and biases of the ansatz network from data-agnostic and data-dependent probability distributions allows us to progress on both challenges. In most examples, the random sampling schemes outperform iterative, gradient-based optimization of physics-informed neural networks regarding training time and accuracy by several orders of magnitude. For time-dependent PDE, we construct neural basis functions only in the spatial domain and then solve the associated ordinary differential equation with classical methods from scientific computing over a long time horizon. This alleviates one of the greatest challenges for neural PDE solvers because it does not require us to parameterize the solution in time. For second-order elliptic PDE in Barron spaces, we prove the existence of sampled networks with $L^2$ convergence to the solution. We demonstrate our approach on several time-dependent and static PDEs. We also illustrate how sampled networks can effectively solve inverse problems in this setting. Benefits compared to common numerical schemes include spectral convergence and mesh-free construction of basis functions.

[256]  arXiv:2405.20838 [pdf, other]
Title: einspace: Searching for Neural Architectures from Fundamental Operations
Comments: Project page at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. create a shift from convolutional structures to transformers. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shifts, we need a novel expressive search space design which is built from more fundamental operations. To this end, we introduce einspace, a search space based on a parameterised probabilistic context-free grammar. Our space is versatile, supporting architectures of various sizes and complexities, while also containing diverse network operations which allow it to model convolutions, attention components and more. It contains many existing competitive architectures, and provides flexibility for discovering new ones. Using this search space, we perform experiments to find novel architectures as well as improvements on existing ones on the diverse Unseen NAS datasets. We show that competitive architectures can be obtained by searching from scratch, and we consistently find large improvements when initialising the search with strong baselines. We believe that this work is an important advancement towards a transformative NAS paradigm where search space expressivity and strategic search initialisation play key roles.

[257]  arXiv:2405.20842 [pdf, ps, other]
Title: Compositional Reversible Computation
Comments: 18 pages
Journal-ref: Reversible Computation, LNCS 14680:10-27, 2024
Subjects: Logic in Computer Science (cs.LO)

Reversible computing is motivated by both pragmatic and foundational considerations arising from a variety of disciplines. We take a particular path through the development of reversible computation, emphasizing compositional reversible computation. We start from a historical perspective, by reviewing those approaches that developed reversible extensions of lambda-calculi, Turing machines, and communicating process calculi. These approaches share a common challenge: computations made reversible in this way do not naturally compose locally.
We then turn our attention to computational models that eschew the detour via existing irreversible models. Building on an original analysis by Landauer, the insights of Bennett, Fredkin, and Toffoli introduced a fresh approach to reversible computing in which reversibility is elevated to the status of the main design principle. These initial models are expressed using low-level bit manipulations, however.
Abstracting from the low-level of the Bennett-Fredkin-Toffoli models and pursuing more intrinsic, typed, and algebraic models, naturally leads to rig categories as the canonical model for compositional reversible programming. The categorical model reveals connections to type isomorphisms, symmetries, permutations, groups, and univalent universes. This, in turn, paves the way for extensions to reversible programming based on monads and arrows. These extensions are shown to recover conventional irreversible programming, a variety of reversible computational effects, and more interestingly both pure (measurement-free) and measurement-based quantum programming.

[258]  arXiv:2405.20846 [pdf, other]
Title: Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models
Comments: Accepted to the main conference ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Image-based advertisements are complex multimodal stimuli that often contain unusual visual elements and figurative language. Previous research on automatic ad understanding has reported impressive zero-shot accuracy of contrastive vision-and-language models (VLMs) on an ad-explanation retrieval task. Here, we examine the original task setup and show that contrastive VLMs can solve it by exploiting grounding heuristics. To control for this confound, we introduce TRADE, a new evaluation test set with adversarial grounded explanations. While these explanations look implausible to humans, we show that they "fool" four different contrastive VLMs. Our findings highlight the need for an improved operationalisation of automatic ad understanding that truly evaluates VLMs' multimodal reasoning abilities. We make our code and TRADE available at https://github.com/dmg-illc/trade .

[259]  arXiv:2405.20847 [pdf, ps, other]
Title: Proportionally dense subgraphs of maximum size in degree-constrained graphs
Subjects: Computational Complexity (cs.CC)

A proportionally dense subgraph (PDS) of a graph is an induced subgraph of size at least two such that every vertex in the subgraph has proportionally as many neighbors inside as outside of the subgraph. Then, maxPDS is the problem of determining a PDS of maximum size in a given graph. If we further require that a PDS induces a connected subgraph, we refer to such problem as connected maxPDS. In this paper, we study the complexity of maxPDS with respect to parameters representing the density of a graph and its complement. We consider $\Delta$, representing the maximum degree, $h$, representing the $h$-index, and degen, representing the degeneracy of a graph. We show that maxPDS is NP-hard parameterized by $\Delta,h$ and degen. More specifically, we show that maxPDS is NP-hard on graphs with $\Delta=4$, $h=4$ and degen=2. Then, we show that maxPDS is NP-hard when restricted to dense graphs, more specifically graphs $G$ such that $\Delta(\overline{G})\leq 6$, and graphs $G$ such that $degen(\overline{G}) \leq 2$ and $\overline{G}$ is bipartite, where $\overline{G}$ represents the complement of $G$. On the other hand, we show that maxPDS is polynomial-time solvable on graphs with $h\le2$. Finally, we consider graphs $G$ such that $h(\overline{G})\le 2$ and show that there exists a polynomial-time algorithm for finding a PDS of maximum size in such graphs. This result implies polynomial-time complexity on graphs with $n$ vertices of minimum degree $n-3$, i.e. graphs $G$ such that $\Delta(\overline{G})\le 2$. For each result presented in this paper, we consider connected maxPDS and explain how to extend it when we require connectivity.

[260]  arXiv:2405.20848 [pdf, other]
Title: SLIM: a Scalable Light-weight Root Cause Analysis for Imbalanced Data in Microservice
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The newly deployed service -- one kind of change service, could lead to a new type of minority fault. Existing state-of-the-art methods for fault localization rarely consider the imbalanced fault classification in change service. This paper proposes a novel method that utilizes decision rule sets to deal with highly imbalanced data by optimizing the F1 score subject to cardinality constraints. The proposed method greedily generates the rule with maximal marginal gain and uses an efficient minorize-maximization (MM) approach to select rules iteratively, maximizing a non-monotone submodular lower bound. Compared with existing fault localization algorithms, our algorithm can adapt to the imbalanced fault scenario of change service, and provide interpretable fault causes which are easy to understand and verify. Our method can also be deployed in the online training setting, with only about 15% training overhead compared to the current SOTA methods. Empirical studies showcase that our algorithm outperforms existing fault localization algorithms in both accuracy and model interpretability.

[261]  arXiv:2405.20849 [pdf, ps, other]
Title: Locally Stationary Distributions: A Framework for Analyzing Slow-Mixing Markov Chains
Comments: 34 pages
Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR)

Many natural Markov chains fail to mix to their stationary distribution in polynomially many steps. Often, this slow mixing is inevitable since it is computationally intractable to sample from their stationary measure.
Nevertheless, Markov chains can be shown to always converge quickly to measures that are *locally stationary*, i.e., measures that don't change over a small number of steps. These locally stationary measures are analogous to local minima in continuous optimization, while stationary measures correspond to global minima.
While locally stationary measures can be statistically far from stationary measures, do they enjoy provable theoretical guarantees that have algorithmic implications? We study this question in this work and demonstrate three algorithmic applications of locally stationary measures:
1. We show that Glauber dynamics on the hardcore model can be used to find independent sets of size $\Omega\left(\frac{\log d}{d} \cdot n\right)$ in triangle-free graphs of degree at most $d$.
2. Let $W$ be a symmetric real matrix with bounded spectral diameter and $v$ be a unit vector. Given the matrix $M = \lambda vv^\top + W$ with a planted rank-one spike along vector $v$, for sufficiently large constant $\lambda$, Glauber dynamics on the Ising model defined by $M$ samples vectors $x \in \{\pm 1\}^n$ that have constant correlation with the vector $v$.
3. Let $M = A_{\mathbf{G}} - \frac{d}{n}\mathbf{1}\mathbf{1}^\top$ be a centered version of the adjacency matrix where the graph $\mathbf{G}$ is drawn from a sparse 2-community stochastic block model.
We show that for sufficiently large constant $\lambda$, Glauber dynamics on the Ising model defined by $M$ samples vectors $x \in \{\pm 1\}^n$ that have constant correlation with the hidden community vector $\mathbf{\sigma}$.

[262]  arXiv:2405.20850 [pdf, other]
Title: Improving Reward Models with Synthetic Critiques
Subjects: Computation and Language (cs.CL)

Reward models (RM) play a critical role in aligning language models through the process of reinforcement learning from human feedback. RMs are trained to predict a score reflecting human preference, which requires significant time and cost for human annotation. Additionally, RMs tend to quickly overfit on superficial features in the training set, hindering their generalization performance on unseen distributions. We propose a novel approach using synthetic natural language critiques generated by large language models to provide additional feedback, evaluating aspects such as instruction following, correctness, and style. This offers richer signals and more robust features for RMs to assess and score on. We demonstrate that high-quality critiques improve the performance and data efficiency of RMs initialized from different pretrained models. Conversely, we also show that low-quality critiques negatively impact performance. Furthermore, incorporating critiques enhances the interpretability and robustness of RM training.

[263]  arXiv:2405.20851 [pdf, other]
Title: MegActor: Harness the Power of Raw Video for Vivid Portrait Animation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research. This is due to two challenges inherent in portrait animation driven with raw videos: 1) significant identity leakage; 2) Irrelevant background and facial details such as wrinkles degrade performance. To harnesses the power of the raw videos for vivid portrait animation, we proposed a pioneering conditional diffusion model named as MegActor. First, we introduced a synthetic data generation framework for creating videos with consistent motion and expressions but inconsistent IDs to mitigate the issue of ID leakage. Second, we segmented the foreground and background of the reference image and employed CLIP to encode the background details. This encoded information is then integrated into the network via a text embedding module, thereby ensuring the stability of the background. Finally, we further style transfer the appearance of the reference image to the driving video to eliminate the influence of facial details in the driving videos. Our final model was trained solely on public datasets, achieving results comparable to commercial models. We hope this will help the open-source community.The code is available at https://github.com/megvii-research/MegFaceAnimate.

[264]  arXiv:2405.20852 [pdf, other]
Title: Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
Subjects: Computation and Language (cs.CL)

Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding the user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieving high performance, most of them still overlook the inherent relationships between intents and slots and fail to achieve mutual guidance between the two subtasks. To solve the problem, we propose a multi-level multi-grained SLU framework MMCL to apply contrastive learning at three levels, including utterance level, slot level, and word level to enable intent and slot to mutually guide each other. For the utterance level, our framework implements coarse granularity contrastive learning and fine granularity contrastive learning simultaneously. Besides, we also apply the self-distillation method to improve the robustness of the model. Experimental results and further analysis demonstrate that our proposed model achieves new state-of-the-art results on two public multi-intent SLU datasets, obtaining a 2.6 overall accuracy improvement on the MixATIS dataset compared to previous best models.

[265]  arXiv:2405.20853 [pdf, other]
Title: MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate the Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large-scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.

[266]  arXiv:2405.20858 [pdf, other]
Title: CSDO: Enhancing Efficiency and Success in Large-Scale Multi-Vehicle Trajectory Planning
Comments: 8 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO)

This paper presents an efficient algorithm, naming Centralized Searching and Decentralized Optimization (CSDO), to find feasible solution for large-scale Multi-Vehicle Trajectory Planning (MVTP) problem. Due to the intractable growth of non-convex constraints with the number of agents, exploring various homotopy classes that imply different convex domains, is crucial for finding a feasible solution. However, existing methods struggle to explore various homotopy classes efficiently due to combining it with time-consuming precise trajectory solution finding. CSDO, addresses this limitation by separating them into different levels and integrating an efficient Multi-Agent Path Finding (MAPF) algorithm to search homotopy classes. It first searches for a coarse initial guess using a large search step, identifying a specific homotopy class. Subsequent decentralized Quadratic Programming (QP) refinement processes this guess, resolving minor collisions efficiently. Experimental results demonstrate that CSDO outperforms existing MVTP algorithms in large-scale, high-density scenarios, achieving up to 95% success rate in 50m $\times$ 50m random scenarios around one second. Source codes are released in https://github.com/YangSVM/CSDOTrajectoryPlanning.

[267]  arXiv:2405.20859 [pdf, other]
Title: clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Comments: under review
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments, and further test its usefulness as an evaluation instrument, along a number of dimensions: We show that it can easily keep up with new developments while avoiding data contamination, we show that the tests implemented within it are not yet saturated (human performance is substantially higher than that of even the best models), and we show that it lends itself to investigating additional questions, such as the impact of the prompting language on performance. We believe that the approach forms a good basis for making decisions on model choice for building applied interactive systems, and perhaps ultimately setting up a closed-loop development environment of system and simulated evaluator.

[268]  arXiv:2405.20860 [pdf, other]
Title: Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Subjects: Machine Learning (cs.LG)

Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the efficiency of safe RL through sample manipulation. ESPO employs an optimization framework with three modes: maximizing rewards, minimizing costs, and balancing the trade-off between the two. By dynamically adjusting the sampling process based on the observed conflict between reward and safety gradients, ESPO theoretically guarantees convergence, optimization stability, and improved sample complexity bounds. Experiments on the Safety-MuJoCo and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primal-dual-based baselines in terms of reward maximization and constraint satisfaction. Moreover, ESPO achieves substantial gains in sample efficiency, requiring 25--29% fewer samples than baselines, and reduces training time by 21--38%.

[269]  arXiv:2405.20861 [pdf, other]
Title: Maximum Bipartite Matching in $n^{2+o(1)}$ Time via a Combinatorial Algorithm
Subjects: Data Structures and Algorithms (cs.DS)

Maximum bipartite matching (MBM) is a fundamental problem in combinatorial optimization with a long and rich history. A classic result of Hopcroft and Karp (1973) provides an $O(m \sqrt{n})$-time algorithm for the problem, where $n$ and $m$ are the number of vertices and edges in the input graph, respectively. For dense graphs, an approach based on fast matrix multiplication achieves a running time of $O(n^{2.371})$. For several decades, these results represented state-of-the-art algorithms, until, in 2013, Madry introduced a powerful new approach for solving MBM using continuous optimization techniques. This line of research led to several spectacular results, culminating in a breakthrough $m^{1+o(1)}$-time algorithm for min-cost flow, that implies an $m^{1+o(1)}$-time algorithm for MBM as well.
These striking advances naturally raise the question of whether combinatorial algorithms can match the performance of the algorithms that are based on continuous techniques for MBM. A recent work of the authors (2024) made progress on this question by giving a combinatorial $\tilde{O}(m^{1/3}n^{5/3})$-time algorithm for MBM, thus outperforming both the Hopcroft-Karp algorithm and matrix multiplication based approaches, on sufficiently dense graphs. Still, a large gap remains between the running time of their algorithm and the almost linear-time achievable by algorithms based on continuous techniques. In this work, we take another step towards narrowing this gap, and present a randomized $n^{2+o(1)}$-time combinatorial algorithm for MBM. Thus in dense graphs, our algorithm essentially matches the performance of algorithms that are based on continuous methods. We also obtain a randomized $n^{2+o(1)}$-time combinatorial algorithm for maximum vertex-capacitated $s$-$t$ flow in directed graphs when all vertex capacities are identical, using a standard reduction from this problem to MBM.

[270]  arXiv:2405.20862 [pdf, other]
Title: BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning
Authors: Songze Li, Yanbo Dai
Subjects: Cryptography and Security (cs.CR)

In a federated learning (FL) system, decentralized data owners (clients) could upload their locally trained models to a central server, to jointly train a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing misclassification to a target class when encountering attacker-defined triggers. Existing backdoor defenses show inconsistent performance under different system and adversarial settings, especially when the malicious updates are made statistically close to the benign ones. In this paper, we first reveal the fact that planting subsequent backdoors with the same target label could significantly help to maintain the accuracy of previously planted backdoors, and then propose a novel proactive backdoor detection mechanism for FL named BackdoorIndicator, which has the server inject indicator tasks into the global model leveraging out-of-distribution (OOD) data, and then utilizing the fact that any backdoor samples are OOD samples with respect to benign samples, the server, who is completely agnostic of the potential backdoor types and target labels, can accurately detect the presence of backdoors in uploaded models, via evaluating the indicator tasks. We perform systematic and extensive empirical studies to demonstrate the consistently superior performance and practicality of BackdoorIndicator over baseline defenses, across a wide range of system and adversarial settings.

[271]  arXiv:2405.20867 [pdf, other]
Title: Automatic Channel Pruning for Multi-Head Attention
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)

Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic channel pruning method to take into account the multi-head attention mechanism. First, we incorporate channel similarity-based weights into the pruning indicator to preserve more informative channels in each head. Then, we adjust pruning indicator to enforce removal of channels in equal proportions across all heads, preventing the channel misalignment. We also add a reweight module to compensate for information loss resulting from channel removal, and an effective initialization step for pruning indicator based on difference of attention between original structure and each channel. Our proposed method can be used to not only original attention, but also linear attention, which is more efficient as linear complexity with respect to the number of tokens. On ImageNet-1K, applying our pruning method to the FLattenTransformer, which includes both attention mechanisms, shows outperformed accuracy for several MACs compared with previous state-of-the-art efficient models and pruned methods. Code will be available soon.

[272]  arXiv:2405.20868 [pdf, other]
Title: Responsible AI for Earth Observation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

The convergence of artificial intelligence (AI) and Earth observation (EO) technologies has brought geoscience and remote sensing into an era of unparalleled capabilities. AI's transformative impact on data analysis, particularly derived from EO platforms, holds great promise in addressing global challenges such as environmental monitoring, disaster response and climate change analysis. However, the rapid integration of AI necessitates a careful examination of the responsible dimensions inherent in its application within these domains. In this paper, we represent a pioneering effort to systematically define the intersection of AI and EO, with a central focus on responsible AI practices. Specifically, we identify several critical components guiding this exploration from both academia and industry perspectives within the EO field: AI and EO for social good, mitigating unfair biases, AI security in EO, geo-privacy and privacy-preserving measures, as well as maintaining scientific excellence, open data, and guiding AI usage based on ethical principles. Furthermore, the paper explores potential opportunities and emerging trends, providing valuable insights for future research endeavors.

[273]  arXiv:2405.20869 [pdf, other]
Title: Understanding the Throughput Bounds of Reconfigurable Datacenter Networks
Subjects: Networking and Internet Architecture (cs.NI)

The increasing gap between the growth of datacenter traffic volume and the capacity of electrical switches led to the emergence of reconfigurable datacenter network designs based on optical circuit switching. A multitude of research works, ranging from demand-oblivious (e.g., RotorNet, Sirius) to demand-aware (e.g., Helios, ProjecToR) reconfigurable networks, demonstrate significant performance benefits. Unfortunately, little is formally known about the achievable throughput of such networks. Only recently have the throughput bounds of demand-oblivious networks been studied. In this paper, we tackle a fundamental question: Whether and to what extent can demand-aware reconfigurable networks improve the throughput of datacenters?
This paper attempts to understand the landscape of the throughput bounds of reconfigurable datacenter networks. Given the rise of machine learning workloads and collective communication in modern datacenters, we specifically focus on their typical communication patterns, namely uniform-residual demand matrices. We formally establish a separation bound of demand-aware networks over demand-oblivious networks, proving analytically that the former can provide at least $16\%$ higher throughput. Our analysis further uncovers new design opportunities based on periodic, fixed-duration reconfigurations that can harness the throughput benefits of demand-aware networks while inheriting the simplicity and low reconfiguration overheads of demand-oblivious networks. Finally, our evaluations corroborate the theoretical results of this paper, demonstrating that demand-aware networks significantly outperform oblivious networks in terms of throughput. This work barely scratches the surface and unveils several intriguing open questions, which we discuss at the end of this paper.

[274]  arXiv:2405.20876 [pdf, other]
Title: Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark Study
Comments: 11 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks. However, high computational and storage demands hinder their deployment into resource-constrained environments, such as embedded devices. Model pruning helps to meet these restrictions by reducing the model size, while maintaining superior performance. Meanwhile, safety-critical applications pose more than just resource and performance constraints. In particular, predictions must not be overly confident, i.e., provide properly calibrated uncertainty estimations (proper uncertainty calibration), and CNNs must be robust against corruptions like naturally occurring input perturbations (natural corruption robustness). This work investigates the important trade-off between uncertainty calibration, natural corruption robustness, and performance for current state-of-research post-hoc CNN pruning techniques in the context of image classification tasks. Our study reveals that post-hoc pruning substantially improves the model's uncertainty calibration, performance, and natural corruption robustness, sparking hope for safe and robust embedded CNNs.Furthermore, uncertainty calibration and natural corruption robustness are not mutually exclusive targets under pruning, as evidenced by the improved safety aspects obtained by post-hoc unstructured pruning with increasing compression.

[275]  arXiv:2405.20877 [pdf, other]
Title: Waveform Design for Over-the-Air Computing
Comments: 14 pages
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Signal Processing (eess.SP); Statistics Theory (math.ST)

In response to the increasing number of devices anticipated in next-generation networks, a shift toward over-the-air (OTA) computing has been proposed. Leveraging the superposition of multiple access channels, OTA computing enables efficient resource management by supporting simultaneous uncoded transmission in the time and the frequency domain. Thus, to advance the integration of OTA computing, our study presents a theoretical analysis addressing practical issues encountered in current digital communication transceivers, such as time sampling error and intersymbol interference (ISI). To this end, we examine the theoretical mean squared error (MSE) for OTA transmission under time sampling error and ISI, while also exploring methods for minimizing the MSE in the OTA transmission. Utilizing alternating optimization, we also derive optimal power policies for both the devices and the base station. Additionally, we propose a novel deep neural network (DNN)-based approach to design waveforms enhancing OTA transmission performance under time sampling error and ISI. To ensure fair comparison with existing waveforms like the raised cosine (RC) and the better-than-raised-cosine (BRTC), we incorporate a custom loss function integrating energy and bandwidth constraints, along with practical design considerations such as waveform symmetry. Simulation results validate our theoretical analysis and demonstrate performance gains of the designed pulse over RC and BTRC waveforms. To facilitate testing of our results without necessitating the DNN structure recreation, we provide curve fitting parameters for select DNN-based waveforms as well.

[276]  arXiv:2405.20878 [pdf, other]
Title: SelfGNN: Self-Supervised Graph Neural Networks for Sequential Recommendation
Comments: Accepted by SIGIR'24
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Sequential recommendation effectively addresses information overload by modeling users' temporal and sequential interaction patterns. To overcome the limitations of supervision signals, recent approaches have adopted self-supervised learning techniques in recommender systems. However, there are still two critical challenges that remain unsolved. Firstly, existing sequential models primarily focus on long-term modeling of individual interaction sequences, overlooking the valuable short-term collaborative relationships among the behaviors of different users. Secondly, real-world data often contain noise, particularly in users' short-term behaviors, which can arise from temporary intents or misclicks. Such noise negatively impacts the accuracy of both graph and sequence models, further complicating the modeling process. To address these challenges, we propose a novel framework called Self-Supervised Graph Neural Network (SelfGNN) for sequential recommendation. The SelfGNN framework encodes short-term graphs based on time intervals and utilizes Graph Neural Networks (GNNs) to learn short-term collaborative relationships. It captures long-term user and item representations at multiple granularity levels through interval fusion and dynamic behavior modeling. Importantly, our personalized self-augmented learning structure enhances model robustness by mitigating noise in short-term graphs based on long-term user interests and personal stability. Extensive experiments conducted on four real-world datasets demonstrate that SelfGNN outperforms various state-of-the-art baselines. Our model implementation codes are available at https://github.com/HKUDS/SelfGNN.

[277]  arXiv:2405.20879 [pdf, other]
Title: Flow matching achieves minimax optimal convergence
Subjects: Machine Learning (cs.LG)

Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of FM in terms of the $p$-Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve the minmax optimal convergence rate for $1 \leq p \leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain these optimal rates.

[278]  arXiv:2405.20880 [pdf, other]
Title: Paying to Do Better: Games with Payments between Learning Agents
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH)

In repeated games, such as auctions, players typically use learning algorithms to choose their actions. The use of such autonomous learning agents has become widespread on online platforms. In this paper, we explore the impact of players incorporating monetary transfers into their agents' algorithms, aiming to incentivize behavior in their favor. Our focus is on understanding when players have incentives to make use of monetary transfers, how these payments affect learning dynamics, and what the implications are for welfare and its distribution among the players. We propose a simple game-theoretic model to capture such scenarios. Our results on general games show that in a broad class of games, players benefit from letting their learning agents make payments to other learners during the game dynamics, and that in many cases, this kind of behavior improves welfare for all players. Our results on first- and second-price auctions show that in equilibria of the ``payment policy game,'' the agents' dynamics can reach strong collusive outcomes with low revenue for the auctioneer. These results highlight a challenge for mechanism design in systems where automated learning agents can benefit from interacting with their peers outside the boundaries of the mechanism.

[279]  arXiv:2405.20881 [pdf, other]
Title: S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image Fusion
Comments: NurIPS, Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As one of the tasks in Image Fusion, Infrared and Visible Image Fusion aims to integrate complementary information captured by sensors of different modalities into a single image. The Selective State Space Model (SSSM), known for its ability to capture long-range dependencies, has demonstrated its potential in the field of computer vision. However, in image fusion, current methods underestimate the potential of SSSM in capturing the global spatial information of both modalities. This limitation prevents the simultaneous consideration of the global spatial information from both modalities during interaction, leading to a lack of comprehensive perception of salient targets. Consequently, the fusion results tend to bias towards one modality instead of adaptively preserving salient targets. To address this issue, we propose the Saliency-aware Selective State Space Fusion Model (S4Fusion). In our S4Fusion, the designed Cross-Modal Spatial Awareness Module (CMSA) can simultaneously focus on global spatial information from both modalities while facilitating their interaction, thereby comprehensively capturing complementary information. Additionally, S4Fusion leverages a pre-trained network to perceive uncertainty in the fused images. By minimizing this uncertainty, S4Fusion adaptively highlights salient targets from both images. Extensive experiments demonstrate that our approach produces high-quality images and enhances performance in downstream tasks.

[280]  arXiv:2405.20882 [pdf, other]
Title: Sheaf HyperNetworks for Personalized Federated Learning
Comments: 25 pages, 12 figures, 7 tables, pre-print under review
Subjects: Machine Learning (cs.LG)

Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heterophily. Moreover, we cannot apply GHNs directly to personalized federated learning (PFL) scenarios, where a priori client relation graph may be absent, private, or inaccessible. To mitigate these limitations in the context of PFL, we propose a novel class of HNs, sheaf hypernetworks (SHNs), which combine cellular sheaf theory with HNs to improve parameter sharing for PFL. We thoroughly evaluate SHNs across diverse PFL tasks, including multi-class classification, traffic and weather forecasting. Additionally, we provide a methodology for constructing client relation graphs in scenarios where such graphs are unavailable. We show that SHNs consistently outperform existing PFL solutions in complex non-IID scenarios. While the baselines' performance fluctuates depending on the task, SHNs show improvements of up to 2.7% in accuracy and 5.3% in lower mean squared error over the best-performing baseline.

[281]  arXiv:2405.20883 [pdf, other]
Title: Scalable Distance-based Multi-Agent Relative State Estimation via Block Multiconvex Optimization
Comments: To appear in Robotics: Science and System 2024
Subjects: Robotics (cs.RO)

This paper explores the distance-based relative state estimation problem in large-scale systems, which is hard to solve effectively due to its high-dimensionality and non-convexity. In this paper, we alleviate this inherent hardness to simultaneously achieve scalability and robustness of inference on this problem. Our idea is launched from a universal geometric formulation, called \emph{generalized graph realization}, for the distance-based relative state estimation problem. Based on this formulation, we introduce two collaborative optimization models, one of which is convex and thus globally solvable, and the other enables fast searching on non-convex landscapes to refine the solution offered by the convex one. Importantly, both models enjoy \emph{multiconvex} and \emph{decomposable} structures, allowing efficient and safe solutions using \emph{block coordinate descent} that enjoys scalability and a distributed nature. The proposed algorithms collaborate to demonstrate superior or comparable solution precision to the current centralized convex relaxation-based methods, which are known for their high optimality. Distinctly, the proposed methods demonstrate scalability beyond the reach of previous convex relaxation-based methods. We also demonstrate that the combination of the two proposed algorithms achieves a more robust pipeline than deploying the local search method alone in a continuous-time scenario.

[282]  arXiv:2405.20884 [pdf, other]
Title: Effects of Dataset Sampling Rate for Noise Cancellation through Deep Learning
Comments: 16 pages, 8 pictures, 3 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Background: Active noise cancellation has been a subject of research for decades. Traditional techniques, like the Fast Fourier Transform, have limitations in certain scenarios. This research explores the use of deep neural networks (DNNs) as a superior alternative. Objective: The study aims to determine the effect sampling rate within training data has on lightweight, efficient DNNs that operate within the processing constraints of mobile devices. Methods: We chose the ConvTasNET network for its proven efficiency in speech separation and enhancement. ConvTasNET was trained on datasets such as WHAM!, LibriMix, and the MS-2023 DNS Challenge. The datasets were sampled at rates of 8kHz, 16kHz, and 48kHz to analyze the effect of sampling rate on noise cancellation efficiency and effectiveness. The model was tested on a core-i7 Intel processor from 2023, assessing the network's ability to produce clear audio while filtering out background noise. Results: Models trained at higher sampling rates (48kHz) provided much better evaluation metrics against Total Harmonic Distortion (THD) and Quality Prediction For Generative Neural Speech Codecs (WARP-Q) values, indicating improved audio quality. However, a trade-off was noted with the processing time being longer for higher sampling rates. Conclusions: The Conv-TasNET network, trained on datasets sampled at higher rates like 48kHz, offers a robust solution for mobile devices in achieving noise cancellation through speech separation and enhancement. Future work involves optimizing the model's efficiency further and testing on mobile devices.

[283]  arXiv:2405.20887 [pdf, other]
Title: On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

This paper investigates the use of deep transfer learning based on convolutional neural networks (CNNs) to monitor the condition of bolted joints using acoustic emissions. Bolted structures are critical components in many mechanical systems, and the ability to monitor their condition status is crucial for effective structural health monitoring. We evaluated the performance of our methodology using the ORION-AE benchmark, a structure composed of two thin beams connected by three bolts, where highly noisy acoustic emission measurements were taken to detect changes in the applied tightening torque of the bolts. The data used from this structure is derived from the transformation of acoustic emission data streams into images using continuous wavelet transform, and leveraging pretrained CNNs for feature extraction and denoising. Our experiments compared single-sensor versus multiple-sensor fusion for estimating the tightening level (loosening) of bolts and evaluated the use of raw versus prefiltered data on the performance. We particularly focused on the generalization capabilities of CNN-based transfer learning across different measurement campaigns and we studied ordinal loss functions to penalize incorrect predictions less severely when close to the ground truth, thereby encouraging misclassification errors to be in adjacent classes. Network configurations as well as learning rate schedulers are also investigated, and super-convergence is obtained, i.e., high classification accuracy is achieved in a few number of iterations with different networks. Furthermore, results demonstrate the generalization capabilities of CNN-based transfer learning for monitoring bolted structures by acoustic emission with varying amounts of prior information required during training.

[284]  arXiv:2405.20892 [pdf, other]
Title: MALT: Multi-scale Action Learning Transformer for Online Action Detection
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Online action detection (OAD) aims to identify ongoing actions from streaming video in real-time, without access to future frames. Since these actions manifest at varying scales of granularity, ranging from coarse to fine, projecting an entire set of action frames to a single latent encoding may result in a lack of local information, necessitating the acquisition of action features across multiple scales. In this paper, we propose a multi-scale action learning transformer (MALT), which includes a novel recurrent decoder (used for feature fusion) that includes fewer parameters and can be trained more efficiently. A hierarchical encoder with multiple encoding branches is further proposed to capture multi-scale action features. The output from the preceding branch is then incrementally input to the subsequent branch as part of a cross-attention calculation. In this way, output features transition from coarse to fine as the branches deepen. We also introduce an explicit frame scoring mechanism employing sparse attention, which filters irrelevant frames more efficiently, without requiring an additional network. The proposed method achieved state-of-the-art performance on two benchmark datasets (THUMOS'14 and TVSeries), outperforming all existing models used for comparison, with an mAP of 0.2% for THUMOS'14 and an mcAP of 0.1% for TVseries.

[285]  arXiv:2405.20895 [pdf, other]
Title: A comparison of correspondence analysis with PMI-based word embedding methods
Subjects: Computation and Language (cs.CL)

Popular word embedding methods such as GloVe and Word2Vec are related to the factorization of the pointwise mutual information (PMI) matrix. In this paper, we link correspondence analysis (CA) to the factorization of the PMI matrix. CA is a dimensionality reduction method that uses singular value decomposition (SVD), and we show that CA is mathematically close to the weighted factorization of the PMI matrix. In addition, we present variants of CA that turn out to be successful in the factorization of the word-context matrix, i.e. CA applied to a matrix where the entries undergo a square-root transformation (ROOT-CA) and a root-root transformation (ROOTROOT-CA). An empirical comparison among CA- and PMI-based methods shows that overall results of ROOT-CA and ROOTROOT-CA are slightly better than those of the PMI-based methods.

[286]  arXiv:2405.20896 [pdf, other]
Title: SPARROW: Smart Precision Agriculture Robot for Ridding of Weeds
Comments: submitted to 5th INTERNATIONAL CONFERENCE OF EMERGING TECHNOLOGIES 2024, BELGAUM, INDIA
Subjects: Robotics (cs.RO)

The advancements in precision agriculture are vital to support the increasing demand for global food supply. Precision spot spraying is a major step towards reducing chemical usage for pest and weed control in agriculture. A novel spot spraying algorithm that autonomously detects weeds and performs trajectory planning for the sprayer nozzle has been proposed. Furthermore, this research introduces a vision-based autonomous navigation system that operates through the detected crop row, effectively synchronizing with an autonomous spraying algorithm. This proposed system is characterized by its cost effectiveness that enable the autonomous spraying of herbicides onto detected weeds.

[287]  arXiv:2405.20900 [pdf, other]
Title: Large Language Models: A New Approach for Privacy Policy Analysis at Scale
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

The number and dynamic nature of web and mobile applications presents significant challenges for assessing their compliance with data protection laws. In this context, symbolic and statistical Natural Language Processing (NLP) techniques have been employed for the automated analysis of these systems' privacy policies. However, these techniques typically require labor-intensive and potentially error-prone manually annotated datasets for training and validation. This research proposes the application of Large Language Models (LLMs) as an alternative for effectively and efficiently extracting privacy practices from privacy policies at scale. Particularly, we leverage well-known LLMs such as ChatGPT and Llama 2, and offer guidance on the optimal design of prompts, parameters, and models, incorporating advanced strategies such as few-shot learning. We further illustrate its capability to detect detailed and varied privacy practices accurately. Using several renowned datasets in the domain as a benchmark, our evaluation validates its exceptional performance, achieving an F1 score exceeding 93%. Besides, it does so with reduced costs, faster processing times, and fewer technical knowledge requirements. Consequently, we advocate for LLM-based solutions as a sound alternative to traditional NLP techniques for the automated analysis of privacy policies at scale.

[288]  arXiv:2405.20902 [pdf, other]
Title: Preemptive Answer "Attacks" on Chain-of-Thought Reasoning
Comments: Accepted to ACL'24 (Findings). Camera-ready version
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users by prompt injection attacks. Experiments reveal that preemptive answers significantly impair the model's reasoning capability across various CoT methods and a broad spectrum of datasets. To bolster the robustness of reasoning, we propose two measures aimed at mitigating this issue to some extent.

[289]  arXiv:2405.20905 [pdf, other]
Title: VENI, VINDy, VICI: a variational reduced-order modeling framework with uncertainty quantification
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Dynamical Systems (math.DS)

The simulation of many complex phenomena in engineering and science requires solving expensive, high-dimensional systems of partial differential equations (PDEs). To circumvent this, reduced-order models (ROMs) have been developed to speed up computations. However, when governing equations are unknown or partially known, typically ROMs lack interpretability and reliability of the predicted solutions.
In this work we present a data-driven, non-intrusive framework for building ROMs where the latent variables and dynamics are identified in an interpretable manner and uncertainty is quantified. Starting from a limited amount of high-dimensional, noisy data the proposed framework constructs an efficient ROM by leveraging variational autoencoders for dimensionality reduction along with a newly introduced, variational version of sparse identification of nonlinear dynamics (SINDy), which we refer to as Variational Identification of Nonlinear Dynamics (VINDy).
In detail, the method consists of Variational Encoding of Noisy Inputs (VENI) to identify the distribution of reduced coordinates. Simultaneously, we learn the distribution of the coefficients of a pre-determined set of candidate functions by VINDy. Once trained offline, the identified model can be queried for new parameter instances and new initial conditions to compute the corresponding full-time solutions. The probabilistic setup enables uncertainty quantification as the online testing consists of Variational Inference naturally providing Certainty Intervals (VICI). In this work we showcase the effectiveness of the newly proposed VINDy method in identifying interpretable and accurate dynamical system for the R\"ossler system with different noise intensities and sources. Then the performance of the overall method - named VENI, VINDy, VICI - is tested on PDE benchmarks including structural mechanics and fluid dynamics.

[290]  arXiv:2405.20906 [pdf, ps, other]
Title: Enhancing Vision Models for Text-Heavy Content Understanding and Interaction
Comments: 5 pages, 4 figures (including 1 graph)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Interacting and understanding with text heavy visual content with multiple images is a major challenge for traditional vision models. This paper is on enhancing vision models' capability to comprehend or understand and learn from images containing a huge amount of textual information from the likes of textbooks and research papers which contain multiple images like graphs, etc and tables in them with different types of axes and scales. The approach involves dataset preprocessing, fine tuning which is by using instructional oriented data and evaluation. We also built a visual chat application integrating CLIP for image encoding and a model from the Massive Text Embedding Benchmark which is developed to consider both textual and visual inputs. An accuracy of 96.71% was obtained. The aim of the project is to increase and also enhance the advance vision models' capabilities in understanding complex visual textual data interconnected data, contributing to multimodal AI.

[291]  arXiv:2405.20914 [pdf, other]
Title: RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs
Comments: 14 pages, 19 figures
Subjects: Cryptography and Security (cs.CR)

The growing popular awareness of personal privacy raises the following quandary: what is the new paradigm for collecting and protecting the data produced by ever-increasing sensor devices. Most previous studies on co-design of data aggregation and privacy preservation assume that a trusted fusion center adheres to privacy regimes. Very recent work has taken steps towards relaxing the assumption by allowing data contributors to locally perturb their own data. Although these solutions withhold some data content to mitigate privacy risks, they have been shown to offer insufficient protection against disclosure attacks. Aiming at providing a more rigorous data safeguard for the Internet of Things (IoTs), this paper initiates the study of privacy-preserving data aggregation. We propose a novel paradigm (called RASE), which can be generalized into a 3-step sequential procedure, noise addition, followed by random permutation, and then parameter estimation. Specially, we design a differentially private randomizer, which carefully guides data contributors to obfuscate the truth. Then, a shuffler is employed to receive the noisy data from all data contributors. After that, it breaks the correct linkage between senders and receivers by applying a random permutation. The estimation phase involves using inaccurate data to calculate an approximate aggregate value. Extensive simulations are provided to explore the privacy-utility landscape of our RASE.

[292]  arXiv:2405.20915 [pdf, other]
Title: Fast yet Safe: Early-Exiting with Risk Control
Comments: 25 pages, 11 figures, 4 tables (incl. appendix)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Scaling machine learning models significantly improves their performance. However, such gains come at the cost of inference being slow and resource-intensive. Early-exit neural networks (EENNs) offer a promising solution: they accelerate inference by allowing intermediate layers to exit and produce a prediction early. Yet a fundamental issue with EENNs is how to determine when to exit without severely degrading performance. In other words, when is it 'safe' for an EENN to go 'fast'? To address this issue, we investigate how to adapt frameworks of risk control to EENNs. Risk control offers a distribution-free, post-hoc solution that tunes the EENN's exiting mechanism so that exits only occur when the output is of sufficient quality. We empirically validate our insights on a range of vision and language tasks, demonstrating that risk control can produce substantial computational savings, all the while preserving user-specified performance goals.

[293]  arXiv:2405.20916 [pdf, ps, other]
Title: Unravelling the Use of Digital Twins to Assist Decision- and Policy-Making in Smart Cities
Comments: 12 pages
Journal-ref: Proceedings of the 37th Bled eConference (Bled 2024), 2024, 755-763
Subjects: Computers and Society (cs.CY)

This short paper represents a systematic literature review that sets the basis for the future development of a framework for digital twin-based decision support in the public sector, specifically for the smart city domain. The final aim of the research is to model context-specific digital twins for aiding the decision-making processes in smart cities and devise methods for defining the policy agenda. Overall, this short paper provides a foundation, based on the main concepts from existing literature, for further research in the role and applications of urban digital twins to assist decision- and policy-making in smart cities. The existing literature analyses common applications of digital twins in smart city development with a focus on supporting decision- and policy-making. Future work will centre on developing a digital-twin-based sustainable smart city and defining different scenarios concerning challenges of good governance, especially so-called wicked problems, in smaller-scale urban and non-urban contexts.

[294]  arXiv:2405.20917 [pdf, other]
Title: Learning to Estimate System Specifications in Linear Temporal Logic using Transformers and Mamba
Comments: 20 pages, 15 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)

Temporal logic is a framework for representing and reasoning about propositions that evolve over time. It is commonly used for specifying requirements in various domains, including hardware and software systems, as well as robotics. Specification mining or formula generation involves extracting temporal logic formulae from system traces and has numerous applications, such as detecting bugs and improving interpretability. Although there has been a surge of deep learning-based methods for temporal logic satisfiability checking in recent years, the specification mining literature has been lagging behind in adopting deep learning methods despite their many advantages, such as scalability. In this paper, we introduce autoregressive models that can generate linear temporal logic formulae from traces, towards addressing the specification mining problem. We propose multiple architectures for this task: transformer encoder-decoder, decoder-only transformer, and Mamba, which is an emerging alternative to transformer models. Additionally, we devise a metric for quantifying the distinctiveness of the generated formulae and a straightforward algorithm to enforce the syntax constraints. Our experiments show that the proposed architectures yield promising results, generating correct and distinct formulae at a fraction of the compute cost needed for the combinatorial baseline.

[295]  arXiv:2405.20918 [pdf, other]
Title: Flexible inference in heterogeneous and attributed multilayer networks
Subjects: Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

Networked datasets are often enriched by different types of information about individual nodes or edges. However, most existing methods for analyzing such datasets struggle to handle the complexity of heterogeneous data, often requiring substantial model-specific analysis. In this paper, we develop a probabilistic generative model to perform inference in multilayer networks with arbitrary types of information. Our approach employs a Bayesian framework combined with the Laplace matching technique to ease interpretation of inferred parameters. Furthermore, the algorithmic implementation relies on automatic differentiation, avoiding the need for explicit derivations. This makes our model scalable and flexible to adapt to any combination of input data. We demonstrate the effectiveness of our method in detecting overlapping community structures and performing various prediction tasks on heterogeneous multilayer data, where nodes and edges have different types of attributes. Additionally, we showcase its ability to unveil a variety of patterns in a social support network among villagers in rural India by effectively utilizing all input information in a meaningful way.

[296]  arXiv:2405.20931 [pdf, ps, other]
Title: Finding Diverse Solutions Parameterized by Cliquewidth
Comments: 28 pages, 1 figure
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

Finding a few solutions for a given problem that are diverse, as opposed to finding a single best solution to solve the problem, has recently become a notable topic in theoretical computer science. Recently, Baste, Fellows, Jaffke, Masa\v{r}\'ik, Oliveira, Philip, and Rosamond showed that under a standard structural parameterization by treewidth, one can find a set of diverse solutions for many problems with only a very small additional cost [Artificial Intelligence 2022]. In this paper, we investigate a much stronger graph parameter, the cliquewidth, which can additionally describe some dense graph classes. Broadly speaking, it describes graphs that can be recursively constructed by a few operations defined on graphs whose vertices are divided into a bounded number of groups while each such group behaves uniformly with respect to any operation.
We show that for any vertex problem, if we are given a dynamic program solving that problem on cliquewidth decomposition, we can modify it to produce a few solutions that are as diverse as possible with as little overhead as in the above-mentioned treewidth paper. As a consequence, we prove that a diverse version of any MSO$_1$ expressible problem can be solved in FPT time parameterized by cliquewidth, the number of sought solutions, and the number of quantifiers in the formula. That was an important missing piece in the complexity landscape of structural graph parameters and logic. We prove our results allowing for a more general natural collection of diversity functions compared to only two mostly studied diversity functions previously. That might be of independent interest as a larger pool of different diversity functions can highlight various aspects of different solutions to a problem.

[297]  arXiv:2405.20933 [pdf, ps, other]
Title: Concentration Bounds for Optimized Certainty Equivalent Risk Estimation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as concentration bounds (assuming sub-Gaussianity). Further, we analyze an efficient stochastic approximation-based OCE estimator, and derive finite sample bounds for the same. To show the applicability of our bounds, we consider a risk-aware bandit problem, with OCE as the risk. For this problem, we derive bound on the probability of mis-identification. Finally, we conduct numerical experiments to validate the theoretical findings.

[298]  arXiv:2405.20935 [pdf, other]
Title: Effective Interplay between Sparsity and Quantization: From Theory to Practice
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two methods remains an open question. In this paper, we investigate the interaction between these two methods and assess whether their combination impacts final model accuracy. We mathematically prove that applying sparsity before quantization is the optimal sequence for these operations, minimizing error in computation. Our empirical studies across a wide range of models, including OPT and Llama model families (125M-8B) and ViT corroborate these theoretical findings. In addition, through rigorous analysis, we demonstrate that sparsity and quantization are not orthogonal; their interaction can significantly harm model accuracy, with quantization error playing a dominant role in this degradation. Our findings extend to the efficient deployment of large models in resource-limited compute platforms and reduce serving cost, offering insights into best practices for applying these compression methods to maximize efficacy without compromising accuracy.

[299]  arXiv:2405.20947 [pdf, other]
Title: OR-Bench: An Over-Refusal Benchmark for Large Language Models
Comments: version 1
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) require careful safety alignment to prevent malicious outputs. While significant research focuses on mitigating harmful content generation, the enhanced safety often come with the side effect of over-refusal, where the LLMs may reject innocuous prompts and become less helpful. Although the issue of over-refusal has been empirically observed, a systematic measurement is challenging due to the difficulty of crafting prompts that appear harmful but are benign. This study proposes a novel method for automatically generating large-scale sets of ``seemingly toxic prompts'' (benign prompts likely rejected by LLMs). Leveraging this technique, we introduce OR-Bench, the first large-scale over-refusal benchmark. OR-Bench comprises 80,000 seemingly toxic prompts across 10 common rejection categories, a subset of around 1,000 hard prompts that are challenging even for state-of-the-art LLMs, and an additional 600 toxic prompts to prevent indiscriminate responses. We then conduct a comprehensive study to measure the over-refusal of 25 popular LLMs across 8 model families. Our datasets are available at https://huggingface.co/datasets/bench-llm/OR-Bench and the corresponding demo can be found at https://huggingface.co/spaces/bench-llm/or-bench. We hope this benchmark can help the community develop better safety aligned models.

[300]  arXiv:2405.20951 [pdf, other]
Title: Monte Carlo Tree Search Satellite Scheduling Under Cloud Cover Uncertainty
Comments: 11 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Efficient utilization of satellite resources in dynamic environments remains a challenging problem in satellite scheduling. This paper addresses the multi-satellite collection scheduling problem (m-SatCSP), aiming to optimize task scheduling over a constellation of satellites under uncertain conditions such as cloud cover. Leveraging Monte Carlo Tree Search (MCTS), a stochastic search algorithm, two versions of MCTS are explored to schedule satellites effectively. Hyperparameter tuning is conducted to optimize the algorithm's performance. Experimental results demonstrate the effectiveness of the MCTS approach, outperforming existing methods in both solution quality and efficiency. Comparative analysis against other scheduling algorithms showcases competitive performance, positioning MCTS as a promising solution for satellite task scheduling in dynamic environments.

[301]  arXiv:2405.20954 [pdf, other]
Title: Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Multiclass neural network classifiers are typically trained using cross-entropy loss. Following training, the performance of this same neural network is evaluated using an application-specific metric based on the multiclass confusion matrix, such as the Macro $F_\beta$-Score. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria, particularly in scenarios where there is a need to emphasize one aspect of classifier performance. For example, if greater precision is preferred over recall, the $\beta$ value in the $F_\beta$ evaluation metric can be adjusted accordingly, but the cross-entropy objective remains unaware of this preference during training. We propose a method that addresses this training-evaluation gap for multiclass neural network classifiers such that users can train these models informed by the desired final $F_\beta$-Score. Following prior work in binary classification, we utilize the concepts of the soft-set confusion matrices and a piecewise-linear approximation of the Heaviside step function. Our method extends the $2 \times 2$ binary soft-set confusion matrix to a multiclass $d \times d$ confusion matrix and proposes dynamic adaptation of the threshold value $\tau$, which parameterizes the piecewise-linear Heaviside approximation during run-time. We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_\beta$ that is a consistent estimator of Macro-$F_\beta$, and our extensive experiments show the practical effectiveness of our approach.

[302]  arXiv:2405.20956 [pdf, other]
Title: A Robot Walks into a Bar: Can Language Models Serve asCreativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Comments: 15 pages, 1 figure, published at ACM FAccT 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We interviewed twenty professional comedians who perform live shows in front of audiences and who use artificial intelligence in their artistic process as part of 3-hour workshops on ``AI x Comedy'' conducted at the Edinburgh Festival Fringe in August 2023 and online. The workshop consisted of a comedy writing session with large language models (LLMs), a human-computer interaction questionnaire to assess the Creativity Support Index of AI as a writing tool, and a focus group interrogating the comedians' motivations for and processes of using AI, as well as their ethical concerns about bias, censorship and copyright. Participants noted that existing moderation strategies used in safety filtering and instruction-tuned LLMs reinforced hegemonic viewpoints by erasing minority groups and their perspectives, and qualified this as a form of censorship. At the same time, most participants felt the LLMs did not succeed as a creativity support tool, by producing bland and biased comedy tropes, akin to ``cruise ship comedy material from the 1950s, but a bit less racist''. Our work extends scholarship about the subtle difference between, one the one hand, harmful speech, and on the other hand, ``offensive'' language as a practice of resistance, satire and ``punching up''. We also interrogate the global value alignment behind such language models, and discuss the importance of community-based value alignment and data ownership to build AI tools that better suit artists' needs.

[303]  arXiv:2405.20959 [pdf, other]
Title: Navigating Tabular Data Synthesis Research: Understanding User Needs and Tool Capabilities
Comments: 14 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when no real data is available (e.g., due to privacy regulations). Synthesizing tabular data presents unique and complex challenges, especially handling (i) missing values, (ii) dataset imbalance, (iii) diverse column types, and (iv) complex data distributions, as well as preserving (i) column correlations, (ii) temporal dependencies, and (iii) integrity constraints (e.g., functional dependencies) present in the original dataset. While substantial progress has been made recently in the context of generational models, there is no one-size-fits-all solution for tabular data today, and choosing the right tool for a given task is therefore no trivial task. In this paper, we survey the state of the art in Tabular Data Synthesis (TDS), examine the needs of users by defining a set of functional and non-functional requirements, and compile the challenges associated with meeting those needs. In addition, we evaluate the reported performance of 36 popular research TDS tools about these requirements and develop a decision guide to help users find suitable TDS tools for their applications. The resulting decision guide also identifies significant research gaps.

[304]  arXiv:2405.20962 [pdf, other]
Title: Large Language Models are Zero-Shot Next Location Predictors
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Predicting the locations an individual will visit in the future is crucial for solving many societal issues like disease diffusion and reduction of pollution among many others. The models designed to tackle next-location prediction, however, require a significant amount of individual-level information to be trained effectively. Such data may be scarce or even unavailable in some geographic regions or peculiar scenarios (e.g., cold-start in recommendation systems). Moreover, the design of a next-location predictor able to generalize or geographically transfer knowledge is still an open research challenge. Recent advances in natural language processing have led to a rapid diffusion of Large Language Models (LLMs) which have shown good generalization and reasoning capabilities. These insights, coupled with the recent findings that LLMs are rich in geographical knowledge, allowed us to believe that these models can act as zero-shot next-location predictors. This paper evaluates the capabilities of many popular LLMs in this role, specifically Llama, GPT-3.5 and Mistral 7B. After designing a proper prompt, we tested the models on three real-world mobility datasets. The results show that LLMs can obtain accuracies up to 32.4%, a significant relative improvement of over 600% when compared to sophisticated DL models specifically designed for human mobility. Moreover, we show that other LLMs are unable to perform the task properly. To prevent positively biased results, we also propose a framework inspired by other studies to test data contamination. Finally, we explored the possibility of using LLMs as text-based explainers for next-location prediction showing that can effectively provide an explanation for their decision. Notably, 7B models provide more generic, but still reliable, explanations compared to larger counterparts. Code: github.com/ssai-trento/LLM-zero-shot-NL

[305]  arXiv:2405.20967 [pdf, other]
Title: Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames
Comments: 11 pages
Subjects: Computation and Language (cs.CL)

Superlatives are used to single out elements with a maximal/minimal property. Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set. As such, superlatives provide an ideal phenomenon for studying implicit phenomena and discourse restrictions. While this comparison set is often not explicitly defined, its (implicit) restrictions can be inferred from the discourse context the expression appears in. In this work we provide an extensive computational study on the semantics of superlatives. We propose a unified account of superlative semantics which allows us to derive a broad-coverage annotation schema. Using this unified schema we annotated a multi-domain dataset of superlatives and their semantic interpretations. We specifically focus on interpreting implicit or ambiguous superlative expressions, by analyzing how the discourse context restricts the set of interpretations. In a set of experiments we then analyze how well models perform at variations of predicting superlative semantics, with and without context. We show that the fine-grained semantics of superlatives in context can be challenging for contemporary models, including GPT-4.

[306]  arXiv:2405.20968 [pdf, other]
Title: A new multivariate primitive from CCZ equivalence
Subjects: Cryptography and Security (cs.CR)

Multivariate Cryptography is one of the main candidates for Post-quantum Cryptography. Multivariate schemes are usually constructed by applying two secret affine invertible transformations $\mathcal S,\mathcal T$ to a set of multivariate polynomials $\mathcal{F}$ (often quadratic). The secret polynomials $\mathcal{F}$ posses a trapdoor that allows the legitimate user to find a solution of the corresponding system, while the public polynomials $\mathcal G=\mathcal S\circ\mathcal F\circ\mathcal T$ look like random polynomials. The polynomials $\mathcal G$ and $\mathcal F$ are said to be affine equivalent. In this article, we present a more general way of constructing a multivariate scheme by considering the CCZ equivalence, which has been introduced and studied in the context of vectorial Boolean functions.

[307]  arXiv:2405.20969 [pdf, other]
Title: Design, Calibration, and Control of Compliant Force-sensing Gripping Pads for Humanoid Robots
Comments: 21 pages, 16 figures, Published in ASME Journal of Mechanisms and Robotics
Journal-ref: Journal of Mechanisms and Robotics, 15, 031010,2023
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper introduces a pair of low-cost, light-weight and compliant force-sensing gripping pads used for manipulating box-like objects with smaller-sized humanoid robots. These pads measure normal gripping forces and center of pressure (CoP). A calibration method is developed to improve the CoP measurement accuracy. A hybrid force-alignment-position control framework is proposed to regulate the gripping forces and to ensure the surface alignment between the grippers and the object. Limit surface theory is incorporated as a contact friction modeling approach to determine the magnitude of gripping forces for slippage avoidance. The integrated hardware and software system is demonstrated with a NAO humanoid robot. Experiments show the effectiveness of the overall approach.

[308]  arXiv:2405.20971 [pdf, other]
Title: Amortizing intractable inference in diffusion models for vision, language, and control
Comments: Code: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generative model prior $p(\mathbf{x})$ and a black-box constraint or likelihood function $r(\mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning.

[309]  arXiv:2405.20972 [pdf, other]
Title: Congestion-Aware Path Re-routing Strategy for Dense Urban Airspace
Subjects: Multiagent Systems (cs.MA); Systems and Control (eess.SY)

Existing UAS Traffic Management (UTM) frameworks designate preplanned flight paths to uncrewed aircraft systems (UAS), enabling the UAS to deliver payloads. However, with increasing delivery demand between the source-destination pairs in the urban airspace, UAS will likely experience considerable congestion on the nominal paths. We propose a rule-based congestion mitigation strategy that improves UAS safety and airspace utilization in congested traffic streams. The strategy relies on nominal path information from the UTM and positional information of other UAS in the vicinity. Following the strategy, UAS opts for alternative local paths in the unoccupied airspace surrounding the nominal path and avoids congested regions. The strategy results in UAS traffic exploring and spreading to alternative adjacent routes on encountering congestion. The paper presents queuing models to estimate the expected traffic spread for varying stochastic delivery demand at the source, thus helping to reserve the airspace around the nominal path beforehand to accommodate any foreseen congestion. Simulations are presented to validate the queuing results in the presence of static obstacles and intersecting UAS streams.

[310]  arXiv:2405.20973 [pdf, other]
Title: LCQ: Low-Rank Codebook based Quantization for Large Language Models
Authors: Wen-Pu Cai, Wu-Jun Li
Comments: 10 pages, 5 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Large language models~(LLMs) have recently demonstrated promising performance in many tasks. However, the high storage and computational cost of LLMs has become a challenge for deploying LLMs. Weight quantization has been widely used for model compression, which can reduce both storage and computational cost. Most existing weight quantization methods for LLMs use a rank-one codebook for quantization, which results in substantial accuracy loss when the compression ratio is high. In this paper, we propose a novel weight quantization method, called low-rank codebook based quantization~(LCQ), for LLMs. LCQ adopts a low-rank codebook, the rank of which can be larger than one, for quantization. Experiments show that LCQ can achieve better accuracy than existing methods with a negligibly extra storage cost.

[311]  arXiv:2405.20974 [pdf, other]
Title: SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Comments: The code is available at \url{this https URL}
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based approaches are limited to binary or inaccurate group-level confidence estimates. In this work, we present the advanced SaySelf, a training framework that teaches LLMs to express more accurate fine-grained confidence estimates. In addition, beyond the confidence scores, SaySelf initiates the process of directing LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge and explain their uncertainty. This is achieved by using an LLM to automatically summarize the uncertainties in specific knowledge via natural language. The summarization is based on the analysis of the inconsistency in multiple sampled reasoning chains, and the resulting data is utilized for supervised fine-tuning. Moreover, we utilize reinforcement learning with a meticulously crafted reward function to calibrate the confidence estimates, motivating LLMs to deliver accurate, high-confidence predictions and to penalize overconfidence in erroneous outputs. Experimental results in both in-distribution and out-of-distribution datasets demonstrate the effectiveness of SaySelf in reducing the confidence calibration error and maintaining the task performance. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration. The code is made public at \url{https://github.com/xu1868/SaySelf}.

[312]  arXiv:2405.20975 [pdf, other]
Title: ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning
Comments: To appear in the 33rd USENIX Security Symposium, 2024
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In Federated Learning (FL), a set of clients collaboratively train a machine learning model (called global model) without sharing their local training data. The local training data of clients is typically non-i.i.d. and heterogeneous, resulting in varying contributions from individual clients to the final performance of the global model. In response, many contribution evaluation methods were proposed, where the server could evaluate the contribution made by each client and incentivize the high-contributing clients to sustain their long-term participation in FL. Existing studies mainly focus on developing new metrics or algorithms to better measure the contribution of each client. However, the security of contribution evaluation methods of FL operating in adversarial environments is largely unexplored. In this paper, we propose the first model poisoning attack on contribution evaluation methods in FL, termed ACE. Specifically, we show that any malicious client utilizing ACE could manipulate the parameters of its local model such that it is evaluated to have a high contribution by the server, even when its local training data is indeed of low quality. We perform both theoretical analysis and empirical evaluations of ACE. Theoretically, we show our design of ACE can effectively boost the malicious client's perceived contribution when the server employs the widely-used cosine distance metric to measure contribution. Empirically, our results show ACE effectively and efficiently deceive five state-of-the-art contribution evaluation methods. In addition, ACE preserves the accuracy of the final global models on testing inputs. We also explore six countermeasures to defend ACE. Our results show they are inadequate to thwart ACE, highlighting the urgent need for new defenses to safeguard the contribution evaluation methods in FL.

[313]  arXiv:2405.20976 [pdf, ps, other]
Title: Matrix Rationalization via Partial Orders
Subjects: Discrete Mathematics (cs.DM); Computer Science and Game Theory (cs.GT)

A preference matrix $M$ has an entry for each pair of candidates in an election whose value $p_{ij}$ represents the proportion of voters that prefer candidate $i$ over candidate $j$. The matrix is rationalizable if it is consistent with a set of voters whose preferences are total orders. A celebrated open problem asks for a concise characterization of rationalizable preference matrices. In this paper, we generalize this matrix rationalizability question and study when a preference matrix is consistent with a set of voters whose preferences are partial orders of width $\alpha$. The width (the maximum cardinality of an antichain) of the partial order is a natural measure of the rationality of a voter; indeed, a partial order of width $1$ is a total order. Our primary focus concerns the rationality number, the minimum width required to rationalize a preference matrix. We present two main results. The first concerns the class of half-integral preference matrices, where we show the key parameter required in evaluating the rationality number is the chromatic number of the undirected unanimity graph associated with the preference matrix $M$. The second concerns the class of integral preference matrices, where we show the key parameter now is the dichromatic number of the directed voting graph associated with $M$.

[314]  arXiv:2405.20978 [pdf, other]
Title: Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training
Journal-ref: ACL 2024, Main Conference
Subjects: Artificial Intelligence (cs.AI)

Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT.

[315]  arXiv:2405.20980 [pdf, other]
Title: Neural Gaussian Scale-Space Fields
Comments: 15 pages; SIGGRAPH 2024; project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

Gaussian scale spaces are a cornerstone of signal representation and processing, with applications in filtering, multiscale analysis, anti-aliasing, and many more. However, obtaining such a scale space is costly and cumbersome, in particular for continuous representations such as neural fields. We present an efficient and lightweight method to learn the fully continuous, anisotropic Gaussian scale space of an arbitrary signal. Based on Fourier feature modulation and Lipschitz bounding, our approach is trained self-supervised, i.e., training does not require any manual filtering. Our neural Gaussian scale-space fields faithfully capture multiscale representations across a broad range of modalities, and support a diverse set of applications. These include images, geometry, light-stage data, texture anti-aliasing, and multiscale optimization.

[316]  arXiv:2405.20981 [pdf, other]
Title: Generative Adversarial Networks in Ultrasound Imaging: Extending Field of View Beyond Conventional Limits
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Transthoracic Echocardiography (TTE) is a fundamental, non-invasive diagnostic tool in cardiovascular medicine, enabling detailed visualization of cardiac structures crucial for diagnosing various heart conditions. Despite its widespread use, TTE ultrasound imaging faces inherent limitations, notably the trade-off between field of view (FoV) and resolution. This paper introduces a novel application of conditional Generative Adversarial Networks (cGANs), specifically designed to extend the FoV in TTE ultrasound imaging while maintaining high resolution. Our proposed cGAN architecture, termed echoGAN, demonstrates the capability to generate realistic anatomical structures through outpainting, effectively broadening the viewable area in medical imaging. This advancement has the potential to enhance both automatic and manual ultrasound navigation, offering a more comprehensive view that could significantly reduce the learning curve associated with ultrasound imaging and aid in more accurate diagnoses. The results confirm that echoGAN reliably reproduce detailed cardiac features, thereby promising a significant step forward in the field of non-invasive cardiac naviagation and diagnostics.

[317]  arXiv:2405.20982 [pdf, other]
Title: Scaling Data Plane Verification with Intent-based Slicing
Subjects: Networking and Internet Architecture (cs.NI)

Data plane verification has grown into a powerful tool to ensure network correctness. However, existing monolithic data plane models have high memory requirements with large networks, and the existing method of scaling out is too limited in expressiveness to capture practical network features. In this paper, we describe Scylla, a general data plane verifier that provides fine-grained scale-out without the need for a monolithic network model. Scylla creates models for what we call intent-based slices, each of which is constructed at a fine (rule-level) granularity with just enough to verify a given set of intents. The sliced models are retained in memory across a cluster and are incrementally updated in a distributed compute cluster in response to network updates. Our experiments show that Scylla makes the scaling problem more granular -- tied to the size of the intent-based slices rather than that of the overall network. This enables Scylla to verify large, complex networks in minimum units of work that are significantly smaller (in both memory and time) than past techniques, enabling fast scale-out verification with minimal resource requirement.

[318]  arXiv:2405.20983 [pdf, other]
Title: Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring
Subjects: Systems and Control (eess.SY)

Goal-oriented communication (GoC) is a form of semantic communication where the effectiveness of information transmission is measured by its impact on achieving the desired goal. In the context of the Internet of Things (IoT), GoC can make IoT sensors to selectively transmit data pertinent to the intended goals of the receiver. Therefore, GoC holds significant value for IoT networks as it facilitates timely decision-making at the receiver, reduces network congestion, and enhances spectral efficiency. In this paper, we consider a scenario where an edge node polls sensors monitoring the state of a non-linear dynamic system (NLDS) to respond to the queries of several clients. Our work delves into the foregoing GoC problem, which we term goal-oriented scheduling (GoS). Our proposed GoS utilizes deep reinforcement learning (DRL) with meticulously devised action space, state space, and reward function. The devised action space and reward function play a pivotal role in reducing the number of sensor transmissions. Meanwhile, the devised state space empowers our DRL scheduler to poll the sensor whose observation is expected to minimize the mean square error (MSE) of the query responses. Our numerical analysis demonstrates that the proposed GoS can either effectively minimize the query response MSE further or obtain a resembling MSE compared to benchmark scheduling methods, depending on the type of query. Furthermore, the proposed GoS proves to be energy-efficient for the sensors and of lower complexity compared to benchmark scheduling methods.

[319]  arXiv:2405.20984 [pdf, other]
Title: Bayesian Design Principles for Offline-to-Online Reinforcement Learning
Comments: Forty-first International Conference on Machine Learning (ICML), 2024
Subjects: Machine Learning (cs.LG)

Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies.
Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.

[320]  arXiv:2405.20985 [pdf, other]
Title: DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The visual projector, which bridges the vision and language modalities and facilitates cross-modal alignment, serves as a crucial component in MLLMs. However, measuring the effectiveness of projectors in vision-language alignment remains under-explored, which currently can only be inferred from the performance of MLLMs on downstream tasks. Motivated by the problem, this study examines the projector module by interpreting the vision-language semantic flow within MLLMs. Specifically, we trace back the semantic relevance flow from generated language tokens to raw visual encoder patches and the intermediate outputs produced by projectors. Our findings reveal that compressive projectors (e.g., QFormer), abstract visual patches into a limited set of semantic concepts, such as objects or attributes, resulting in a 'double abstraction' phenomenon. This involves a first visual semantic abstraction by the projector referring to pre-defined query tokens, and a second extraction by the LLM based on text instructions. The double abstraction is inefficient in training and will result in cumulative vision semantics deficiency. To mitigate this issue, we propose the key insight of 'Decouple Compression from Abstraction (DeCo), that is compressing the visual token number at the patch level by projectors and allowing the LLM to handle visual semantic abstraction entirely. Consequently, we adopt a simple compressor, i.e., 2D Adaptive Pooling, to downsample visual patches in a parameter-free manner. Empirical evaluation demonstrates that DeCo surpasses traditional compressive projectors regarding both performance and efficiency. It achieves performance gains of 0.9%, 7.1%, and 2.9% across the MLLM Benchmarks, Visual Localization, and Open-ended VQA tasks with fewer trainable parameters and faster convergence speed.

[321]  arXiv:2405.20986 [pdf, other]
Title: Uncertainty Quantification for Bird's Eye View Semantic Segmentation: Methods and Benchmarks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The fusion of raw features from multiple sensors on an autonomous vehicle to create a Bird's Eye View (BEV) representation is crucial for planning and control systems. There is growing interest in using deep learning models for BEV semantic segmentation. Anticipating segmentation errors and improving the explainability of DNNs is essential for autonomous driving, yet it is under-studied. This paper introduces a benchmark for predictive uncertainty quantification in BEV segmentation. The benchmark assesses various approaches across three popular datasets using two representative backbones and focuses on the effectiveness of predicted uncertainty in identifying misclassified and out-of-distribution (OOD) pixels, as well as calibration. Empirical findings highlight the challenges in uncertainty quantification. Our results find that evidential deep learning based approaches show the most promise by efficiently quantifying aleatoric and epistemic uncertainty. We propose the Uncertainty-Focal-Cross-Entropy (UFCE) loss, designed for highly imbalanced data, which consistently improves the segmentation quality and calibration. Additionally, we introduce a vacuity-scaled regularization term that enhances the model's focus on high uncertainty pixels, improving epistemic uncertainty quantification.

[322]  arXiv:2405.20987 [pdf, other]
Title: Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical Imaging
Comments: This paper is accepted at the 35th IEEE Irish Signals and Systems Conference (ISSC 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Generative Adversarial Networks (GANs) have high computational costs to train their complex architectures. Throughout the training process, GANs' output is analyzed qualitatively based on the loss and synthetic images' diversity and quality. Based on this qualitative analysis, training is manually halted once the desired synthetic images are generated. By utilizing an early stopping criterion, the computational cost and dependence on manual oversight can be reduced yet impacted by training problems such as mode collapse, non-convergence, and instability. This is particularly prevalent in biomedical imagery, where training problems degrade the diversity and quality of synthetic images, and the high computational cost associated with training makes complex architectures increasingly inaccessible. This work proposes a novel early stopping criteria to quantitatively detect training problems, halt training, and reduce the computational costs associated with synthesizing biomedical images. Firstly, the range of generator and discriminator loss values is investigated to assess whether mode collapse, non-convergence, and instability occur sequentially, concurrently, or interchangeably throughout the training of GANs. Secondly, utilizing these occurrences in conjunction with the Mean Structural Similarity Index (MS-SSIM) and Fr\'echet Inception Distance (FID) scores of synthetic images forms the basis of the proposed early stopping criteria. This work helps identify the occurrence of training problems in GANs using low-resource computational cost and reduces training time to generate diversified and high-quality synthetic images.

[323]  arXiv:2405.20988 [pdf, other]
Title: Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Driven by the ever-growing volume and decentralized nature of data, coupled with the escalating size of modern models, distributed deep learning (DDL) has been entrenched as the preferred paradigm for training. However, frequent synchronization of DL models, encompassing millions to many billions of parameters, creates a communication bottleneck, severely hindering scalability. Worse yet, DDL algorithms typically waste valuable bandwidth, and make themselves less practical in bandwidth-constrained federated settings, by relying on overly simplistic, periodic, and rigid synchronization schedules. To address these shortcomings, we propose Federated Dynamic Averaging (FDA), a communication-efficient DDL strategy that dynamically triggers synchronization based on the value of the model variance. Through extensive experiments across a wide range of learning tasks we demonstrate that FDA reduces communication cost by orders of magnitude, compared to both traditional and cutting-edge communication-efficient algorithms. Remarkably, FDA achieves this without sacrificing convergence speed - in stark contrast to the trade-offs encountered in the field. Additionally, we show that FDA maintains robust performance across diverse data heterogeneity settings.

[324]  arXiv:2405.20990 [pdf, other]
Title: Locking Machine Learning Models into Hardware
Comments: 10 pages, 2 figures of main text; 14 pages, 16 figures of appendices
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Modern Machine Learning models are expensive IP and business competitiveness often depends on keeping this IP confidential. This in turn restricts how these models are deployed -- for example it is unclear how to deploy a model on-device without inevitably leaking the underlying model. At the same time, confidential computing technologies such as Multi-Party Computation or Homomorphic encryption remain impractical for wide adoption. In this paper we take a different approach and investigate feasibility of ML-specific mechanisms that deter unauthorized model use by restricting the model to only be usable on specific hardware, making adoption on unauthorized hardware inconvenient. That way, even if IP is compromised, it cannot be trivially used without specialised hardware or major model adjustment. In a sense, we seek to enable cheap locking of machine learning models into specific hardware. We demonstrate that locking mechanisms are feasible by either targeting efficiency of model representations, such making models incompatible with quantisation, or tie the model's operation on specific characteristics of hardware, such as number of cycles for arithmetic operations. We demonstrate that locking comes with negligible work and latency overheads, while significantly restricting usability of the resultant model on unauthorized hardware.

[325]  arXiv:2405.20991 [pdf, other]
Title: Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
Comments: IEEE Intelligent Vehicles Symposium (IV) 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Addressing hard cases in autonomous driving, such as anomalous road users, extreme weather conditions, and complex traffic interactions, presents significant challenges. To ensure safety, it is crucial to detect and manage these scenarios effectively for autonomous driving systems. However, the rarity and high-risk nature of these cases demand extensive, diverse datasets for training robust models. Vision-Language Foundation Models (VLMs) have shown remarkable zero-shot capabilities as being trained on extensive datasets. This work explores the potential of VLMs in detecting hard cases in autonomous driving. We demonstrate the capability of VLMs such as GPT-4v in detecting hard cases in traffic participant motion prediction on both agent and scenario levels. We introduce a feasible pipeline where VLMs, fed with sequential image frames with designed prompts, effectively identify challenging agents or scenarios, which are verified by existing prediction models. Moreover, by taking advantage of this detection of hard cases by VLMs, we further improve the training efficiency of the existing motion prediction pipeline by performing data selection for the training samples suggested by GPT. We show the effectiveness and feasibility of our pipeline incorporating VLMs with state-of-the-art methods on NuScenes datasets. The code is accessible at https://github.com/KTH-RPL/Detect_VLM.

[326]  arXiv:2405.20993 [pdf, other]
Title: Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise
Subjects: Information Theory (cs.IT); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Statistics Theory (math.ST)

We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is i.i.d. Gaussian, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractability, a line of work has focused on rotationally invariant noise. However, existing studies either provide sub-optimal algorithms or they are limited to a special class of noise ensembles. In this paper, we establish the first characterization of the information-theoretic limits for a noise matrix drawn from a general trace ensemble. These limits are then achieved by an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer (TAP) equations. Our approach leverages tools from statistical physics (replica method) and random matrix theory (generalized spherical integrals), and it unveils the equivalence between the rotationally invariant model and a surrogate Gaussian model.

[327]  arXiv:2405.20994 [pdf, other]
Title: CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance Ranking
Comments: Accepted to SIGIR 2024
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

We present CWRCzech, Click Web Ranking dataset for Czech, a 100M query-document Czech click dataset for relevance ranking with user behavior data collected from search engine logs of Seznam.cz. To the best of our knowledge, CWRCzech is the largest click dataset with raw text published so far. It provides document positions in the search results as well as information about user behavior: 27.6M clicked documents and 10.8M dwell times. In addition, we also publish a manually annotated Czech test for the relevance task, containing nearly 50k query-document pairs, each annotated by at least 2 annotators. Finally, we analyze how the user behavior data improve relevance ranking and show that models trained on data automatically harnessed at sufficient scale can surpass the performance of models trained on human annotated data. CWRCzech is published under an academic non-commercial license and is available to the research community at https://github.com/seznam/CWRCzech.

[328]  arXiv:2405.21003 [pdf, other]
Title: Explaining Predictions by Characteristic Rules
Comments: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022
Journal-ref: In: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13713. Springer, Cham (2023)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Characteristic rules have been advocated for their ability to improve interpretability over discriminative rules within the area of rule learning. However, the former type of rule has not yet been used by techniques for explaining predictions. A novel explanation technique, called CEGA (Characteristic Explanatory General Association rules), is proposed, which employs association rule mining to aggregate multiple explanations generated by any standard local explanation technique into a set of characteristic rules. An empirical investigation is presented, in which CEGA is compared to two state-of-the-art methods, Anchors and GLocalX, for producing local and aggregated explanations in the form of discriminative rules. The results suggest that the proposed approach provides a better trade-off between fidelity and complexity compared to the two state-of-the-art approaches; CEGA and Anchors significantly outperform GLocalX with respect to fidelity, while CEGA and GLocalX significantly outperform Anchors with respect to the number of generated rules. The effect of changing the format of the explanations of CEGA to discriminative rules and using LIME and SHAP as local explanation techniques instead of Anchors are also investigated. The results show that the characteristic explanatory rules still compete favorably with rules in the standard discriminative format. The results also indicate that using CEGA in combination with either SHAP or Anchors consistently leads to a higher fidelity compared to using LIME as the local explanation technique.

[329]  arXiv:2405.21004 [pdf, other]
Title: MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses
Comments: 8 pages, 7 figures
Subjects: Human-Computer Interaction (cs.HC); Emerging Technologies (cs.ET)

We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses, designed to track fine-grained dietary actions like hand-to-mouth movements for food intake, chewing, and drinking. MunchSonic emits inaudible ultrasonic waves from a commodity eyeglass frame. The reflected signals contain rich information about the position and movements of various body parts, including the mouth, jaw, arms, and hands, all of which are involved in eating activities. These signals are then processed by a custom deep-learning pipeline to classify six actions: food intake, chewing, drinking, talking, face-hand touching, and other activities (null). In an unconstrained user study with 12 participants, MunchSonic achieves a 93.5% macro F1-score in a user-independent evaluation with a 2-second time resolution, demonstrating its effectiveness. Additionally, MunchSonic accurately tracks eating episodes and the frequency of food intake within those episodes.

[330]  arXiv:2405.21005 [pdf, other]
Title: Influx ratio preserving coupling conditions for the networked Lighthill-Whitham-Richards model
Authors: Niklas Kolbe
Comments: 7 pages, 2 figures
Subjects: Numerical Analysis (math.NA)

A new coupling rule for the Lighthill-Whitham-Richards model at merging junctions is introduced that imposes the preservation of the ratio between inflow from a given road to the total inflow into the junction. This rule is considered both in the context of the original traffic flow model and a relaxation setting giving rise to two different Riemann solvers that are discussed for merging 2-to-1 junctions. Numerical experiments are shown suggesting that the relaxation based Riemann solver is capable of suitable predictions of both, free-flow and congestion scenarios without relying on flow maximization.

[331]  arXiv:2405.21009 [pdf, other]
Title: FunLess: Functions-as-a-Service for Private Edge Cloud Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We present FunLess, a Function-as-a-Service (FaaS) platform tailored for the private edge cloud system. FunLess responds to recent trends that advocate for extending the coverage of serverless computing to private edge cloud systems and enhancing latency, security, and privacy while improving resource usage. Unlike existing solutions that rely on containers for function invocation, FunLess leverages WebAssembly (Wasm) as its runtime environment. Wasm's lightweight, sandboxed runtime is crucial to have functions run on constrained devices at the edge. Moreover, the advantages of using Wasm in FunLess include a consistent development and deployment environment for users and function portability (write once, run everywhere)
We validate FunLess under different deployment scenarios, characterised by the presence/absence of constrained-resource devices (Raspberry Pi 3B+) and the (in)accessibility of container orchestration technologies - Kubernetes. We compare FunLess with three production-ready, widely adopted open-source FaaS platforms - OpenFaaS, Fission, and Knative. Our benchmarks confirm that FunLess is a proper solution for FaaS private edge cloud systems since it achieves performance comparable to the considered FaaS alternatives while it is the only fully-deployable alternative on constrained-resource devices, thanks to its small memory footprint.

[332]  arXiv:2405.21010 [pdf, ps, other]
Title: Likelihood Equilibria in the Ising Game
Authors: Andrey Leonidov
Subjects: Computer Science and Game Theory (cs.GT)

A description of static equilibria in the noisy binary choice (Ising) game on complete and random graphs resulting from maximisation of the likelihood of system configurations is presented. An equivalence of such likelihood equilibria to the competitive Bayes-Nash quantal response expectation equilibria in the special case of consistent agents expectations is established. It is shown that the same likelihood equilibria are obtained by considering the system's partition function.

[333]  arXiv:2405.21012 [pdf, other]
Title: G-Transformer for Conditional Average Potential Outcome Estimation over Time
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

Estimating potential outcomes for treatments over time based on observational data is important for personalized decision-making in medicine. Yet, existing neural methods for this task suffer from either (a) bias or (b) large variance. In order to address both limitations, we introduce the G-transformer (GT). Our GT is a novel, neural end-to-end model designed for unbiased, low-variance estimation of conditional average potential outcomes (CAPOs) over time. Specifically, our GT is the first neural model to perform regression-based iterative G-computation for CAPOs in the time-varying setting. We evaluate the effectiveness of our GT across various experiments. In sum, this work represents a significant step towards personalized decision-making from electronic health records.

[334]  arXiv:2405.21013 [pdf, other]
Title: StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text-rich images have significant and extensive value, deeply integrated into various aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images play crucial roles in information transmission but are accompanied by diverse challenges. Therefore, the efficient and effective understanding of text-rich images is a crucial litmus test for the capability of Vision-Language Models. We have crafted an efficient vision-language model, StrucTexTv3, tailored to tackle various intelligent tasks for text-rich images. The significant design of StrucTexTv3 is presented in the following aspects: Firstly, we adopt a combination of an effective multi-scale reduced visual transformer and a multi-granularity token sampler (MG-Sampler) as a visual token generator, successfully solving the challenges of high-resolution input and complex representation learning for text-rich images. Secondly, we enhance the perception and comprehension abilities of StrucTexTv3 through instruction learning, seamlessly integrating various text-oriented tasks into a unified framework. Thirdly, we have curated a comprehensive collection of high-quality text-rich images, abbreviated as TIM-30M, encompassing diverse scenarios like incidental scenes, office documents, web pages, and screenshots, thereby improving the robustness of our model. Our method achieved SOTA results in text-rich image perception tasks, and significantly improved performance in comprehension tasks. Among multimodal models with LLM decoder of approximately 1.8B parameters, it stands out as a leader, which also makes the deployment of edge devices feasible. In summary, the StrucTexTv3 model, featuring efficient structural design, outstanding performance, and broad adaptability, offers robust support for diverse intelligent application tasks involving text-rich images, thus exhibiting immense potential for widespread application.

[335]  arXiv:2405.21015 [pdf, other]
Title: The rising costs of training frontier AI models
Subjects: Computers and Society (cs.CY)

The costs of training frontier AI models have grown dramatically in recent years, but there is limited public data on the magnitude and growth of these expenses. This paper develops a detailed cost model to address this gap, estimating training costs using three approaches that account for hardware, energy, cloud rental, and staff expenses. The analysis reveals that the amortized cost to train the most compute-intensive models has grown precipitously at a rate of 2.4x per year since 2016 (95% CI: 2.0x to 3.1x). For key frontier models, such as GPT-4 and Gemini, the most significant expenses are AI accelerator chips and staff costs, each costing tens of millions of dollars. Other notable costs include server components (15-22%), cluster-level interconnect (9-13%), and energy consumption (2-6%). If the trend of growing development costs continues, the largest training runs will cost more than a billion dollars by 2027, meaning that only the most well-funded organizations will be able to finance frontier AI models.

[336]  arXiv:2405.21016 [pdf, other]
Title: MpoxSLDNet: A Novel CNN Model for Detecting Monkeypox Lesions and Performance Comparison with Pre-trained Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Monkeypox virus (MPXV) is a zoonotic virus that poses a significant threat to public health, particularly in remote parts of Central and West Africa. Early detection of monkeypox lesions is crucial for effective treatment. However, due to its similarity with other skin diseases, monkeypox lesion detection is a challenging task. To detect monkeypox, many researchers used various deep-learning models such as MobileNetv2, VGG16, ResNet50, InceptionV3, DenseNet121, EfficientNetB3, MobileNetV2, and Xception. However, these models often require high storage space due to their large size. This study aims to improve the existing challenges by introducing a CNN model named MpoxSLDNet (Monkeypox Skin Lesion Detector Network) to facilitate early detection and categorization of Monkeypox lesions and Non-Monkeypox lesions in digital images. Our model represents a significant advancement in the field of monkeypox lesion detection by offering superior performance metrics, including precision, recall, F1-score, accuracy, and AUC, compared to traditional pre-trained models such as VGG16, ResNet50, and DenseNet121. The key novelty of our approach lies in MpoxSLDNet's ability to achieve high detection accuracy while requiring significantly less storage space than existing models. By addressing the challenge of high storage requirements, MpoxSLDNet presents a practical solution for early detection and categorization of monkeypox lesions in resource-constrained healthcare settings. In this study, we have used "Monkeypox Skin Lesion Dataset" comprising 1428 skin images of monkeypox lesions and 1764 skin images of Non-Monkeypox lesions. Dataset's limitations could potentially impact the model's ability to generalize to unseen cases. However, the MpoxSLDNet model achieved a validation accuracy of 94.56%, compared to 86.25%, 84.38%, and 67.19% for VGG16, DenseNet121, and ResNet50, respectively.

[337]  arXiv:2405.21018 [pdf, other]
Title: Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)

Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milestone, its attacking efficiency remains unsatisfactory. In this paper, we present several improved (empirical) techniques for optimization-based jailbreaks like GCG. We first observe that the single target template of "Sure" largely limits the attacking performance of GCG; given this, we propose to apply diverse target templates containing harmful self-suggestion and/or guidance to mislead LLMs. Besides, from the optimization aspects, we propose an automatic multi-coordinate updating strategy in GCG (i.e., adaptively deciding how many tokens to replace in each step) to accelerate convergence, as well as tricks like easy-to-hard initialisation. Then, we combine these improved technologies to develop an efficient jailbreak method, dubbed $\mathcal{I}$-GCG. In our experiments, we evaluate on a series of benchmarks (such as NeurIPS 2023 Red Teaming Track). The results demonstrate that our improved techniques can help GCG outperform state-of-the-art jailbreaking attacks and achieve nearly 100% attack success rate. The code is released at https://github.com/jiaxiaojunQAQ/I-GCG.

[338]  arXiv:2405.21021 [pdf, other]
Title: Beyond Conventional Parametric Modeling: Data-Driven Framework for Estimation and Prediction of Time Activity Curves in Dynamic PET Imaging
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Dynamical Systems (math.DS)

Dynamic Positron Emission Tomography (dPET) imaging and Time-Activity Curve (TAC) analyses are essential for understanding and quantifying the biodistribution of radiopharmaceuticals over time and space. Traditional compartmental modeling, while foundational, commonly struggles to fully capture the complexities of biological systems, including non-linear dynamics and variability. This study introduces an innovative data-driven neural network-based framework, inspired by Reaction Diffusion systems, designed to address these limitations. Our approach, which adaptively fits TACs from dPET, enables the direct calibration of diffusion coefficients and reaction terms from observed data, offering significant improvements in predictive accuracy and robustness over traditional methods, especially in complex biological scenarios. By more accurately modeling the spatio-temporal dynamics of radiopharmaceuticals, our method advances modeling of pharmacokinetic and pharmacodynamic processes, enabling new possibilities in quantitative nuclear medicine.

[339]  arXiv:2405.21022 [pdf, other]
Title: You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Comments: Technical report. Yiran Zhong is the corresponding author. The code is available at this https URL
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed. However, the inherent decay mechanism in linear attention presents challenges when applied to multi-dimensional sequence modeling tasks, such as image processing and multi-modal learning. In these scenarios, the utilization of sequential scanning to establish a global receptive field necessitates multiple scans for multi-dimensional data, thereby leading to inefficiencies. This paper identifies the inefficiency caused by a multiplicative linear recurrence and proposes an efficient alternative additive linear recurrence to avoid the issue, as it can handle multi-dimensional data within a single scan. We further develop an efficient multi-dimensional sequential modeling framework called LightNet based on the new recurrence. Moreover, we present two new multi-dimensional linear relative positional encoding methods, MD-TPE and MD-LRPE to enhance the model's ability to discern positional information in multi-dimensional scenarios. Our empirical evaluations across various tasks, including image classification, image generation, bidirectional language modeling, and autoregressive language modeling, demonstrate the efficacy of LightNet, showcasing its potential as a versatile and efficient solution for multi-dimensional sequential modeling.

[340]  arXiv:2405.21025 [pdf, other]
Title: On reduction and parameter recovering of Petri's cycloids
Comments: 26 pages, 9 figures. arXiv admin note: text overlap with arXiv:2402.07303
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Cycloids are particular Petri nets for modelling processes of actions and events, belonging to the fundaments of Petri's general systems theory. Defined by four parameters they provide an algebraic formalism to describe strongly synchronized sequential processes. To further investigate their structure, reduction systems of cycloids are defined in the style of rewriting systems and properties of reduced cycloids are proved. In particular the recovering of cycloid parameters from their Petri net structure is derived.

[341]  arXiv:2405.21027 [pdf, other]
Title: Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles
Comments: 20 pages, 5 figures
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

For solving zero-sum games involving non-transitivity, a common approach is to maintain population policies to approximate the Nash Equilibrium (NE). Previous research has shown that the Policy Space Response Oracle (PSRO) is an effective multi-agent reinforcement learning framework for these games. However, repeatedly training new policies from scratch to approximate the Best Response (BR) to opponents' mixed policies at each iteration is inefficient and costly. While some PSRO methods initialize a new BR policy by inheriting from past BR policies, this approach limits the exploration of new policies, especially against challenging opponents.To address this issue, we propose Fusion-PSRO, which uses model fusion to initialize the policy for better approximation to BR. With Top-k probabilities from NE, we select high-quality base policies and fuse them into a new BR policy through model averaging. This approach allows the initialized policy to incorporate multiple expert policies, making it easier to handle difficult opponents compared to inheriting or initializing from scratch. Additionally, our method only modifies the policy initialization, enabling its application to nearly all PSRO variants without additional training overhead.Our experiments with non-transitive matrix games, Leduc poker, and the more complex Liars Dice demonstrate that Fusion-PSRO enhances the performance of nearly all PSRO variants, achieving lower exploitability.

[342]  arXiv:2405.21028 [pdf, other]
Title: LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
Comments: 17 pages. Code: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

When answering questions, LLMs can convey not only an answer, but a level of confidence about the answer being correct. This includes explicit confidence markers (e.g. giving a numeric score) as well as implicit markers, like an authoritative tone or elaborating with additional knowledge. For LLMs to be trustworthy knowledge sources, the confidence they convey should match their actual expertise; however, most current models tend towards overconfidence. To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. We cast calibration as preference optimization, creating data via a two-agent game, where a speaker model's outputs are judged by a simulated listener. We then finetune three LLMs (Mistral-7B, Llama3-8B, Llama3-70B) with LACIE, and show that the resulting models are better calibrated w.r.t. a simulated listener. Crucially, these trends transfer to human listeners, helping them correctly predict model correctness: we conduct a human evaluation where annotators accept or reject an LLM's answers, finding that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers. Furthermore, LACIE generalizes to another dataset, resulting in a large increase in truthfulness on TruthfulQA when trained on TriviaQA. Our analysis indicates that LACIE leads to a better confidence separation between correct and incorrect examples. Qualitatively, we find that a LACIE-trained model hedges more and implicitly signals certainty when it is correct by using an authoritative tone or including details. Finally, LACIE finetuning leads to an emergent increase in model abstention (e.g. saying "I don't know") for answers that are likely wrong.

[343]  arXiv:2405.21030 [pdf, other]
Title: Standards for Belief Representations in LLMs
Subjects: Artificial Intelligence (cs.AI)

As large language models (LLMs) continue to demonstrate remarkable abilities across various domains, computer scientists are developing methods to understand their cognitive processes, particularly concerning how (and if) LLMs internally represent their beliefs about the world. However, this field currently lacks a unified theoretical foundation to underpin the study of belief in LLMs. This article begins filling this gap by proposing adequacy conditions for a representation in an LLM to count as belief-like. We argue that, while the project of belief measurement in LLMs shares striking features with belief measurement as carried out in decision theory and formal epistemology, it also differs in ways that should change how we measure belief. Thus, drawing from insights in philosophy and contemporary practices of machine learning, we establish four criteria that balance theoretical considerations with practical constraints. Our proposed criteria include accuracy, coherence, uniformity, and use, which together help lay the groundwork for a comprehensive understanding of belief representation in LLMs. We draw on empirical work showing the limitations of using various criteria in isolation to identify belief representations.

[344]  arXiv:2405.21034 [pdf, other]
Title: Multirobot Watchman Routes in a Simple Polygon
Journal-ref: 36th Canadian Conference on Computational Geometry (CCCG 2024)
Subjects: Computational Geometry (cs.CG)

The well-known \textsc{Watchman Route} problem seeks a shortest route in a polygonal domain from which every point of the domain can be seen. In this paper, we study the cooperative variant of the problem, namely the \textsc{$k$-Watchmen Routes} problem, in a simple polygon $P$. We look at both the version in which the $k$ watchmen must collectively see all of $P$, and the quota version in which they must see a predetermined fraction of $P$'s area.
We give an exact pseudopolynomial time algorithm for the \textsc{$k$-Watchmen Routes} problem in a simple orthogonal polygon $P$ with the constraint that watchmen must move on axis-parallel segments, and there is a given common starting point on the boundary. Further, we give a fully polynomial-time approximation scheme and a constant-factor approximation for unconstrained movement. For the quota version, we give a constant-factor approximation in a simple polygon, utilizing the solution to the (single) \textsc{Quota Watchman Route} problem.

[345]  arXiv:2405.21036 [pdf, ps, other]
Title: A-PETE: Adaptive Prototype Explanations of Tree Ensembles
Subjects: Machine Learning (cs.LG)

The need for interpreting machine learning models is addressed through prototype explanations within the context of tree ensembles. An algorithm named Adaptive Prototype Explanations of Tree Ensembles (A-PETE) is proposed to automatise the selection of prototypes for these classifiers. Its unique characteristics is using a specialised distance measure and a modified k-medoid approach. Experiments demonstrated its competitive predictive accuracy with respect to earlier explanation algorithms. It also provides a a sufficient number of prototypes for the purpose of interpreting the random forest classifier.

[346]  arXiv:2405.21040 [pdf, other]
Title: Direct Alignment of Language Models via Quality-Aware Self-Refinement
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Reinforcement Learning from Human Feedback (RLHF) has been commonly used to align the behaviors of Large Language Models (LLMs) with human preferences. Recently, a popular alternative is Direct Policy Optimization (DPO), which replaces an LLM-based reward model with the policy itself, thus obviating the need for extra memory and training time to learn the reward model. However, DPO does not consider the relative qualities of the positive and negative responses, and can lead to sub-optimal training outcomes. To alleviate this problem, we investigate the use of intrinsic knowledge within the on-the-fly fine-tuning LLM to obtain relative qualities and help to refine the loss function. Specifically, we leverage the knowledge of the LLM to design a refinement function to estimate the quality of both the positive and negative responses. We show that the constructed refinement function can help self-refine the loss function under mild assumptions. The refinement function is integrated into DPO and its variant Identity Policy Optimization (IPO). Experiments across various evaluators indicate that they can improve the performance of the fine-tuned models over DPO and IPO.

[347]  arXiv:2405.21042 [pdf, other]
Title: Comparing information content of representation spaces for disentanglement with VAE ensembles
Comments: Code: this https URL
Subjects: Machine Learning (cs.LG)

Disentanglement is the endeavour to use machine learning to divide information about a dataset into meaningful fragments. In practice these fragments are representation (sub)spaces, often the set of channels in the latent space of a variational autoencoder (VAE). Assessments of disentanglement predominantly employ metrics that are coarse-grained at the model level, but this approach can obscure much about the process of information fragmentation. Here we propose to study the learned channels in aggregate, as the fragments of information learned by an ensemble of repeat training runs. Additionally, we depart from prior work where measures of similarity between individual subspaces neglected the nature of data embeddings as probability distributions. Instead, we view representation subspaces as communication channels that perform a soft clustering of the data; consequently, we generalize two classic information-theoretic measures of similarity between clustering assignments to compare representation spaces. We develop a lightweight method of estimation based on fingerprinting representation subspaces by their ability to distinguish dataset samples, allowing us to identify, analyze, and leverage meaningful structure in ensembles of VAEs trained on synthetic and natural datasets. Using this fully unsupervised pipeline we identify "hotspots" in the space of information fragments: groups of nearly identical representation subspaces that appear repeatedly in an ensemble of VAEs, particularly as regularization is increased. Finally, we leverage the proposed methodology to achieve ensemble learning with VAEs, boosting the information content of a set of weak learners -- a capability not possible with previous methods of assessing channel similarity.

[348]  arXiv:2405.21043 [pdf, other]
Title: Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
Journal-ref: Proceedings of the 41 st International Conference on Machine Learning, 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision processes. Notably, using only a target network or an over-parameterized model does not provide such a convergence guarantee. Additionally, we extend our results to learning with truncated trajectories, showing that convergence is achievable for all tasks with minor modifications, akin to value truncation for the final states in trajectories. Our primary result focuses on temporal difference estimation for prediction, providing high-probability value estimation error bounds and empirical analysis on Baird's counterexample and a Four-room task. Furthermore, we explore the control setting, demonstrating that similar convergence conditions apply to Q-learning.

[349]  arXiv:2405.21044 [pdf, other]
Title: Designing for Fairness in Human-Robot Interactions
Authors: Houston Claure
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

The foundation of successful human collaboration is deeply rooted in the principles of fairness. As robots are increasingly prevalent in various parts of society where they are working alongside groups and teams of humans, their ability to understand and act according to principles of fairness becomes crucial for their effective integration. This is especially critical when robots are part of multi-human teams, where they must make continuous decisions regarding the allocation of resources. These resources can be material, such as tools, or communicative, such as gaze direction, and must be distributed fairly among team members to ensure optimal team performance and healthy group dynamics. Therefore, our research focuses on understanding how robots can effectively participate within human groups by making fair decisions while contributing positively to group dynamics and outcomes. In this paper, I discuss advances toward ensuring that robots are capable of considering human notions of fairness in their decision-making.

[350]  arXiv:2405.21045 [pdf, ps, other]
Title: An Attention-Based Multi-Context Convolutional Encoder-Decoder Neural Network for Work Zone Traffic Impact Prediction
Subjects: Machine Learning (cs.LG)

Work zone is one of the major causes of non-recurrent traffic congestion and road incidents. Despite the significance of its impact, studies on predicting the traffic impact of work zones remain scarce. In this paper, we propose a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms, and introduce a novel deep learning model to predict the traffic speed and incident likelihood during planned work zone events. The proposed model transforms traffic patterns into 2D space-time images for both model input and output and employs an attention-based multi-context convolutional encoder-decoder architecture to capture the spatial-temporal dependencies between work zone events and traffic variations. Trained and validated on four years of archived work zone traffic data from Maryland, USA, the model demonstrates superior performance over baseline models in predicting traffic speed, incident likelihood, and inferred traffic attributes such as queue length and congestion timings (i.e., start time and duration). Specifically, the proposed model outperforms the baseline models by reducing the prediction error of traffic speed by 5% to 34%, queue length by 11% to 29%, congestion timing by 6% to 17%, and increasing the accuracy of incident predictions by 5% to 7%. Consequently, this model offers substantial promise for enhancing the planning and traffic management of work zones.

[351]  arXiv:2405.21046 [pdf, other]
Title: Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to produce diverse, maximally informative responses. By allowing RLHF to confidently stray from the pre-trained model, online exploration offers the possibility of novel, potentially super-human capabilities, but its full potential as a paradigm for language model training has yet to be realized, owing to computational and statistical bottlenecks in directly adapting existing reinforcement learning techniques. We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO), which is simple and practical -- a one-line change to (online) Direct Preference Optimization (DPO; Rafailov et al., 2023) -- yet enjoys the strongest known provable guarantees and promising empirical performance. XPO augments the DPO objective with a novel and principled exploration bonus, empowering the algorithm to explore outside the support of the initial model and human feedback data. In theory, we show that XPO is provably sample-efficient and converges to a near-optimal language model policy under natural exploration conditions, irrespective of whether the initial model has good coverage. Our analysis, which builds on the observation that DPO implicitly performs a form of $Q^{\star}$-approximation (or, Bellman error minimization), combines previously disparate techniques from language modeling and theoretical reinforcement learning in a serendipitous fashion through the perspective of KL-regularized Markov decision processes. Empirically, we find that XPO is more sample-efficient than non-exploratory DPO variants in a preliminary evaluation.

[352]  arXiv:2405.21047 [pdf, other]
Title: Grammar-Aligned Decoding
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. Specifically, in grammar-constrained decoding (GCD), the LLM's output must follow a given grammar. In this paper we demonstrate that GCD techniques (and in general constrained decoding techniques) can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM, and so ultimately are low-quality. We call the problem of aligning sampling with a grammar constraint, grammar-aligned decoding (GAD), and propose adaptive sampling with approximate expected futures (ASAp), a decoding algorithm that guarantees the output to be grammatical while provably producing outputs that match the conditional probability of the LLM's distribution conditioned on the given grammar constraint. Our algorithm uses prior sample outputs to soundly overapproximate the future grammaticality of different output prefixes. Our evaluation on code generation and structured NLP tasks shows how ASAp often produces outputs with higher likelihood (according to the LLM's distribution) than existing GCD techniques, while still enforcing the desired grammatical constraints.

[353]  arXiv:2405.21048 [pdf, other]
Title: Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Comments: 22 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in the sampled images, particularly when sampling with a high classifier-free guidance weight. To address this issue, we present Kaleido, a novel approach that enhances the diversity of samples by incorporating autoregressive latent priors. Kaleido integrates an autoregressive language model that encodes the original caption and generates latent variables, serving as abstract and intermediary representations for guiding and facilitating the image generation process. In this paper, we explore a variety of discrete latent representations, including textual descriptions, detection bounding boxes, object blobs, and visual tokens. These representations diversify and enrich the input conditions to the diffusion models, enabling more diverse outputs. Our experimental results demonstrate that Kaleido effectively broadens the diversity of the generated image samples from a given textual description while maintaining high image quality. Furthermore, we show that Kaleido adheres closely to the guidance provided by the generated latent variables, demonstrating its capability to effectively control and direct the image generation process.

[354]  arXiv:2405.21050 [pdf, other]
Title: Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. Using the Kronecker product and efficient Stiefel optimizers, we achieve parameter-efficient adaptation of orthogonal matrices. We introduce Spectral Orthogonal Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity. Extensive evaluations on text-to-image diffusion models demonstrate SODA's effectiveness, offering a spectrum-aware alternative to existing fine-tuning methods.

[355]  arXiv:2405.21051 [pdf, other]
Title: Good Modelling Software Practices
Comments: 1 Figure
Subjects: Software Engineering (cs.SE); Populations and Evolution (q-bio.PE)

In socio-environmental sciences, models are frequently used as tools to represent, understand, project and predict the behaviour of these complex systems. Along the modelling chain, Good Modelling Practices have been evolving that ensure -- amongst others -- that models are transparent and replicable. Whenever such models are represented in software, good modelling meets Good software Practices, such as a tractable development workflow, good code, collaborative development and governance, continuous integration and deployment, and Good Scientific Practices, such as attribution of copyrights and acknowledgement of intellectual property, publication of a software paper and archiving. Too often in existing socio-environmental model software, these practices have been regarded as an add-on to be considered at a later stage only; in fact, many modellers have shied away from publishing their model as open source out of fear that having to add good practices is too demanding. We here argue for making a habit of following a list of simple and not so simple practices early on in the implementation of the model life cycle. We contextualise cherry-picked and hands-on practices for supporting Good Modelling Practices, and we demonstrate their application in the example context of the Viable North Sea fisheries socio-ecological systems model.

[356]  arXiv:2405.21055 [pdf, ps, other]
Title: Factors Influencing Performance of Students in Software Automated Test Tools Course
Comments: 8 pages
Journal-ref: 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024)
Subjects: Software Engineering (cs.SE)

Formal software testing education is important for building efficient QA professionals. Various aspects of quality assurance approaches are usually covered in courses for training software testing students. Automated Test Tools is one of the core courses in the software testing post-graduate curriculum due to the high demand for automated testers in the workforce. It is important to understand which factors are affecting student performance in the automated testing course to be able to assist the students early on based on their needs. Various metrics that are considered for predicting student performance in this testing course are student engagement, grades on individual deliverables, and prerequisite courses. This study identifies the impact of assessing students based on individual vs. group activities, theoretical vs. practical components, and the effect of having taken prerequisite courses in their final grade. To carry out this research, student data was collected from the automated test tools course of a community college-based postgraduate certificate program in software testing. The dataset contained student records from the years 2021 to 2022 and consisted of information from five different semesters. Various machine learning algorithms were applied to develop an effective model for predicting students performance in the automated software testing tools course, and finally, important features affecting the students performance were identified. The predictive performance model of the automated test tools course that was developed by applying the logistic regression technique, showed the best performance, with an accuracy score of 90%.

[357]  arXiv:2405.21056 [pdf, other]
Title: An Organic Weed Control Prototype using Directed Energy and Deep Learning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Organic weed control is a vital to improve crop yield with a sustainable approach. In this work, a directed energy weed control robot prototype specifically designed for organic farms is proposed. The robot uses a novel distributed array robot (DAR) unit for weed treatment. Soybean and corn databases are built to train deep learning neural nets to perform weed recognition. The initial deep learning neural nets show a high performance in classifying crops. The robot uses a patented directed energy plant eradication recipe that is completely organic and UV-C free, with no chemical damage or physical disturbance to the soil. The deep learning can classify 8 common weed species in a soybean field under natural environment with up to 98% accuracy.

[358]  arXiv:2405.21059 [pdf, other]
Title: Unified Directly Denoising for Both Variance Preserving and Variance Exploding Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Previous work has demonstrated that, in the Variance Preserving (VP) scenario, the nascent Directly Denoising Diffusion Models (DDDM) can generate high-quality images in one step while achieving even better performance in multistep sampling. However, the Pseudo-LPIPS loss used in DDDM leads to concerns about the bias in assessment. Here, we propose a unified DDDM (uDDDM) framework that generates images in one-step/multiple steps for both Variance Preserving (VP) and Variance Exploding (VE) cases. We provide theoretical proofs of the existence and uniqueness of the model's solution paths, as well as the non-intersecting property of the sampling paths. Additionally, we propose an adaptive Pseudo-Huber loss function to balance the convergence to the true solution and the stability of convergence process.Through a comprehensive evaluation, we demonstrate that uDDDMs achieve FID scores comparable to the best-performing methods available for CIFAR-10 in both VP and VE. Specifically, uDDDM achieves one-step generation on CIFAR10 with FID of 2.63 and 2.53 for VE and VP respectively. By extending the sampling to 1000 steps, we further reduce FID score to 1.71 and 1.65 for VE and VP respectively, setting state-of-the-art performance in both cases.

[359]  arXiv:2405.21060 [pdf, other]
Title: Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Authors: Tri Dao, Albert Gu
Comments: ICML 2024
Subjects: Machine Learning (cs.LG)

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

[360]  arXiv:2405.21061 [pdf, other]
Title: Graph External Attention Enhanced Transformer
Comments: In Proceedings of ICML 2024
Subjects: Machine Learning (cs.LG)

The Transformer architecture has recently gained considerable attention in the field of graph representation learning, as it naturally overcomes several limitations of Graph Neural Networks (GNNs) with customized attention mechanisms or positional and structural encodings. Despite making some progress, existing works tend to overlook external information of graphs, specifically the correlation between graphs. Intuitively, graphs with similar structures should have similar representations. Therefore, we propose Graph External Attention (GEA) -- a novel attention mechanism that leverages multiple external node/edge key-value units to capture inter-graph correlations implicitly. On this basis, we design an effective architecture called Graph External Attention Enhanced Transformer (GEAET), which integrates local structure and global interaction information for more comprehensive graph representations. Extensive experiments on benchmark datasets demonstrate that GEAET achieves state-of-the-art empirical performance. The source code is available for reproducibility at: https://github.com/icm1018/GEAET.

[361]  arXiv:2405.21063 [pdf, other]
Title: Neural Network Verification with Branch-and-Bound for General Nonlinearities
Comments: Preprint
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide which neuron to branch, we design a new branching heuristic which leverages linear bounds as shortcuts to efficiently estimate the potential improvement after branching. To decide nontrivial branching points for general nonlinear functions, we propose to optimize branching points offline, which can be efficiently leveraged during verification with a lookup table. We demonstrate the effectiveness of our GenBaB on verifying a wide range of NNs, including networks with activation functions such as Sigmoid, Tanh, Sine and GeLU, as well as networks involving multi-dimensional nonlinear operations such as multiplications in LSTMs and Vision Transformers. Our framework also allows the verification of general nonlinear computation graphs and enables verification applications beyond simple neural networks, particularly for AC Optimal Power Flow (ACOPF). GenBaB is part of the latest $\alpha,\!\beta$-CROWN, the winner of the 4th International Verification of Neural Networks Competition (VNN-COMP 2023).

[362]  arXiv:2405.21064 [pdf, other]
Title: Recurrent neural networks: vanishing and exploding gradients are not the end of the story
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories, primarily due to vanishing and exploding gradients. The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. Our analysis further reveals the importance of the element-wise recurrence design pattern combined with careful parametrizations in mitigating this effect. This feature is present in SSMs, as well as in other architectures, such as LSTMs. Overall, our insights provide a new explanation for some of the difficulties in gradient-based learning of RNNs and why some architectures perform better than others.

[363]  arXiv:2405.21066 [pdf, other]
Title: Mixed Diffusion for 3D Indoor Scene Synthesis
Comments: 19 pages, 14 figures. Under review. Code to be released at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Realistic conditional 3D scene synthesis significantly enhances and accelerates the creation of virtual environments, which can also provide extensive training data for computer vision and robotics research among other applications. Diffusion models have shown great performance in related applications, e.g., making precise arrangements of unordered sets. However, these models have not been fully explored in floor-conditioned scene synthesis problems. We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture, designed to synthesize plausible 3D indoor scenes from given room types, floor plans, and potentially pre-existing objects. We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation. Our approach uniquely implements structured corruption across the mixed discrete semantic and continuous geometric domains, resulting in a better conditioned problem for the reverse denoising step. We evaluate our approach on the 3D-FRONT dataset. Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis. In addition, our models can handle partial object constraints via a corruption-and-masking strategy without task specific training. We show MiDiffusion maintains clear advantages over existing approaches in scene completion and furniture arrangement experiments.

[364]  arXiv:2405.21068 [pdf, other]
Title: Code Pretraining Improves Entity Tracking Abilities of Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent work has provided indirect evidence that pretraining language models on code improves the ability of models to track state changes of discourse entities expressed in natural language. In this work, we systematically test this claim by comparing pairs of language models on their entity tracking performance. Critically, the pairs consist of base models and models trained on top of these base models with additional code data. We extend this analysis to additionally examine the effect of math training, another highly structured data type, and alignment tuning, an important step for enhancing the usability of models. We find clear evidence that models additionally trained on large amounts of code outperform the base models. On the other hand, we find no consistent benefit of additional math training or alignment tuning across various model families.

[365]  arXiv:2405.21070 [pdf, other]
Title: Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)

Severe data imbalance naturally exists among web-scale vision-language datasets. Despite this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning, and demonstrates significant effectiveness in learning generalizable representations. With an aim to investigate the reasons behind this finding, we conduct controlled experiments to study various underlying factors, and reveal that CLIP's pretext task forms a dynamic classification problem wherein only a subset of classes is present in training. This isolates the bias from dominant classes and implicitly balances the learning signal. Furthermore, the robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts, which are inaccessible to supervised learning. Our study not only uncovers the mechanisms behind CLIP's generalizability beyond data imbalance but also provides transferable insights for the research community. The findings are validated in both supervised and self-supervised learning, enabling models trained on imbalanced data to achieve CLIP-level performance on diverse recognition tasks. Code will be available at: https://github.com/CVMI-Lab/clip-beyond-tail.

[366]  arXiv:2405.21074 [pdf, other]
Title: Latent Intrinsics Emerge from Training to Relight
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrinsics. This paper describes a relighting method that is entirely data-driven, where intrinsics and lighting are each represented as latent variables. Our approach produces SOTA relightings of real scenes, as measured by standard metrics. We show that albedo can be recovered from our latent intrinsics without using any example albedos, and that the albedos recovered are competitive with SOTA methods.

[367]  arXiv:2405.21075 [pdf, other]
Title: Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. The potential of MLLMs in processing sequential visual data is still insufficiently explored, highlighting the absence of a comprehensive, high-quality assessment of their performance. In this paper, we introduce Video-MME, the first-ever full-spectrum, Multi-Modal Evaluation benchmark of MLLMs in Video analysis. Our work distinguishes from existing benchmarks through four key features: 1) Diversity in video types, spanning 6 primary visual domains with 30 subfields to ensure broad scenario generalizability; 2) Duration in temporal dimension, encompassing both short-, medium-, and long-term videos, ranging from 11 seconds to 1 hour, for robust contextual dynamics; 3) Breadth in data modalities, integrating multi-modal inputs besides video frames, including subtitles and audios, to unveil the all-round capabilities of MLLMs; 4) Quality in annotations, utilizing rigorous manual labeling by expert annotators to facilitate precise and reliable model assessment. 900 videos with a total of 256 hours are manually selected and annotated by repeatedly viewing all the video content, resulting in 2,700 question-answer pairs. With Video-MME, we extensively evaluate various state-of-the-art MLLMs, including GPT-4 series and Gemini 1.5 Pro, as well as open-source image models like InternVL-Chat-V1.5 and video models like LLaVA-NeXT-Video. Our experiments reveal that Gemini 1.5 Pro is the best-performing commercial model, significantly outperforming the open-source models. Our dataset along with these findings underscores the need for further improvements in handling longer sequences and multi-modal data. Project Page: https://video-mme.github.io

Cross-lists for Mon, 3 Jun 24

[368]  arXiv:1709.00668 (cross-list from stat.ML) [pdf, other]
Title: SamBaTen: Sampling-based Batch Incremental Tensor Decomposition
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Tensor decompositions are invaluable tools in analyzing multimodal datasets. In many real-world scenarios, such datasets are far from being static, to the contrary they tend to grow over time. For instance, in an online social network setting, as we observe new interactions over time, our dataset gets updated in its "time" mode. How can we maintain a valid and accurate tensor decomposition of such a dynamically evolving multimodal dataset, without having to re-compute the entire decomposition after every single update? In this paper we introduce SaMbaTen, a Sampling-based Batch Incremental Tensor Decomposition algorithm, which incrementally maintains the decomposition given new updates to the tensor dataset. SaMbaTen is able to scale to datasets that the state-of-the-art in incremental tensor decomposition is unable to operate on, due to its ability to effectively summarize the existing tensor and the incoming updates, and perform all computations in the reduced summary space. We extensively evaluate SaMbaTen using synthetic and real datasets. Indicatively, SaMbaTen achieves comparable accuracy to state-of-the-art incremental and non-incremental techniques, while being 25-30 times faster. Furthermore, SaMbaTen scales to very large sparse and dense dynamically evolving tensors of dimensions up to 100K x 100K x 100K where state-of-the-art incremental approaches were not able to operate.

[369]  arXiv:2402.04264 (cross-list from cond-mat.dis-nn) [pdf, other]
Title: Analysis of Hopfield Model as Associative Memory
Authors: Matteo Silvestri
Comments: 35 pages, 23 figures, 3 codes
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Information Retrieval (cs.IR)

This article delves into the Hopfield neural network model, drawing inspiration from biological neural systems. The exploration begins with an overview of the model's foundations, incorporating insights from mechanical statistics to deepen our understanding. Focusing on audio retrieval, the study demonstrates the Hopfield model's associative memory capabilities. Through practical implementation, the network is trained to retrieve different patterns.

[370]  arXiv:2405.20348 (cross-list from physics.ao-ph) [pdf, other]
Title: Personalized Adapter for Large Meteorology Model on Devices: Towards Weather Foundation Models
Comments: 42 pages, under review
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)

This paper demonstrates that pre-trained language models (PLMs) are strong foundation models for on-device meteorological variables modeling. We present LM-Weather, a generic approach to taming PLMs, that have learned massive sequential knowledge from the universe of natural language databases, to acquire an immediate capability to obtain highly customized models for heterogeneous meteorological data on devices while keeping high efficiency. Concretely, we introduce a lightweight personalized adapter into PLMs and endows it with weather pattern awareness. During communication between clients and the server, low-rank-based transmission is performed to effectively fuse the global knowledge among devices while maintaining high communication efficiency and ensuring privacy. Experiments on real-wold dataset show that LM-Weather outperforms the state-of-the-art results by a large margin across various tasks (e.g., forecasting and imputation at different scales). We provide extensive and in-depth analyses experiments, which verify that LM-Weather can (1) indeed leverage sequential knowledge from natural language to accurately handle meteorological sequence, (2) allows each devices obtain highly customized models under significant heterogeneity, and (3) generalize under data-limited and out-of-distribution (OOD) scenarios.

[371]  arXiv:2405.20368 (cross-list from math.CO) [pdf, ps, other]
Title: Sphere packing proper colorings of an expander graph
Authors: Honglin Zhu
Comments: 17 pages, 2 figues
Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

We introduce a new notion of error-correcting codes on $[q]^n$ where a code is a set of proper $q$-colorings of some fixed $n$-vertex graph $G$. For a pair of proper $q$-colorings $X, Y$ of $G$, we define their distance as the minimum Hamming distance between $X$ and $\sigma(Y)$ over all $\sigma \in S_q$. We then say that a set of proper $q$-colorings of $G$ is $\delta$-distinct if any pair of colorings in the set have distance at least $\delta n$.
We investigate how one-sided spectral expansion relates to the largest possible set of $\delta$-distinct colorings on a graph. For fixed $(\delta, \lambda) \in [0, 1] \times [-1, 1]$ and positive integer $d$, let $f_{\delta, \lambda, d}(n)$ denote the maximal size of a set of $\delta$-distinct colorings of any $d$-regular graph on at most $n$ vertices with normalized second eigenvalue at most $\lambda$. We study the growth of $f$ as $n$ goes to infinity. We partially characterize regimes of $(\delta, \lambda)$ where $f$ grows exponentially, is finite, and is at most $1$, respectively. We also prove several sharp phase transitions between these regimes.

[372]  arXiv:2405.20384 (cross-list from cond-mat.quant-gas) [pdf, other]
Title: Recurrent neural network wave functions for Rydberg atom arrays on kagome lattice
Comments: 13 pages, 5 figures, 3 tables. Link to GitHub repository: this https URL
Subjects: Quantum Gases (cond-mat.quant-gas); Disordered Systems and Neural Networks (cond-mat.dis-nn); Strongly Correlated Electrons (cond-mat.str-el); Machine Learning (cs.LG); Quantum Physics (quant-ph)

Rydberg atom array experiments have demonstrated the ability to act as powerful quantum simulators, preparing strongly-correlated phases of matter which are challenging to study for conventional computer simulations. A key direction has been the implementation of interactions on frustrated geometries, in an effort to prepare exotic many-body states such as spin liquids and glasses. In this paper, we apply two-dimensional recurrent neural network (RNN) wave functions to study the ground states of Rydberg atom arrays on the kagome lattice. We implement an annealing scheme to find the RNN variational parameters in regions of the phase diagram where exotic phases may occur, corresponding to rough optimization landscapes. For Rydberg atom array Hamiltonians studied previously on the kagome lattice, our RNN ground states show no evidence of exotic spin liquid or emergent glassy behavior. In the latter case, we argue that the presence of a non-zero Edwards-Anderson order parameter is an artifact of the long autocorrelations times experienced with quantum Monte Carlo simulations. This result emphasizes the utility of autoregressive models, such as RNNs, to explore Rydberg atom array physics on frustrated lattices and beyond.

[373]  arXiv:2405.20389 (cross-list from astro-ph.IM) [pdf, other]
Title: Designing an Evaluation Framework for Large Language Models in Astronomy Research
Comments: 7 pages, 3 figures. Code available at this https URL
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currently no standard for evaluating the use of LLMs in astronomy. Therefore, we present the experimental design for an evaluation study on how astronomy researchers interact with LLMs. We deploy a Slack chatbot that can answer queries from users via Retrieval-Augmented Generation (RAG); these responses are grounded in astronomy papers from arXiv. We record and anonymize user questions and chatbot answers, user upvotes and downvotes to LLM responses, user feedback to the LLM, and retrieved documents and similarity scores with the query. Our data collection method will enable future dynamic evaluations of LLM tools for astronomy.

[374]  arXiv:2405.20392 (cross-list from eess.IV) [pdf, other]
Title: Can No-Reference Quality-Assessment Methods Serve as Perceptual Losses for Super-Resolution?
Comments: 4 pages, 3 figures. The first two authors contributed equally to this work
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Perceptual losses play an important role in constructing deep-neural-network-based methods by increasing the naturalness and realism of processed images and videos. Use of perceptual losses is often limited to LPIPS, a fullreference method. Even though deep no-reference image-qualityassessment methods are excellent at predicting human judgment, little research has examined their incorporation in loss functions. This paper investigates direct optimization of several video-superresolution models using no-reference image-quality-assessment methods as perceptual losses. Our experimental results show that straightforward optimization of these methods produce artifacts, but a special training procedure can mitigate them.

[375]  arXiv:2405.20400 (cross-list from stat.ME) [pdf, other]
Title: Fast leave-one-cluster-out cross-validation by clustered Network Information Criteria (NICc)
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)

This paper introduced a clustered estimator of the Network Information Criterion (NICc) to approximate leave-one-cluster-out cross-validated deviance, which can be used as an alternative to cluster-based cross-validation when modeling clustered data. Stone proved that Akaike Information Criterion (AIC) is an asymptotic equivalence to leave-one-observation-out cross-validation if the parametric model is true. Ripley pointed out that the Network Information Criterion (NIC) derived in Stone's proof, is a better approximation to leave-one-observation-out cross-validation when the model is not true. For clustered data, we derived a clustered estimator of NIC, referred to as NICc, by substituting the Fisher information matrix in NIC with its estimator that adjusts for clustering. This adjustment imposes a larger penalty in NICc than the unclustered estimator of NIC when modeling clustered data, thereby preventing overfitting more effectively. In a simulation study and an empirical example, we used linear and logistic regression to model clustered data with Gaussian or binomial response, respectively. We showed that NICc is a better approximation to leave-one-cluster-out deviance and prevents overfitting more effectively than AIC and Bayesian Information Criterion (BIC). NICc leads to more accurate model selection, as determined by cluster-based cross-validation, compared to AIC and BIC.

[376]  arXiv:2405.20402 (cross-list from eess.AS) [pdf, other]
Title: Cross-Talk Reduction
Comments: in International Joint Conference on Artificial Intelligence (IJCAI), 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)

While far-field multi-talker mixtures are recorded, each speaker can wear a close-talk microphone so that close-talk mixtures can be recorded at the same time. Although each close-talk mixture has a high signal-to-noise ratio (SNR) of the wearer, it has a very limited range of applications, as it also contains significant cross-talk speech by other speakers and is not clean enough. In this context, we propose a novel task named cross-talk reduction (CTR) which aims at reducing cross-talk speech, and a novel solution named CTRnet which is based on unsupervised or weakly-supervised neural speech separation. In unsupervised CTRnet, close-talk and far-field mixtures are stacked as input for a DNN to estimate the close-talk speech of each speaker. It is trained in an unsupervised, discriminative way such that the DNN estimate for each speaker can be linearly filtered to cancel out the speaker's cross-talk speech captured at other microphones. In weakly-supervised CTRnet, we assume the availability of each speaker's activity timestamps during training, and leverage them to improve the training of unsupervised CTRnet. Evaluation results on a simulated two-speaker CTR task and on a real-recorded conversational speech separation and recognition task show the effectiveness and potential of CTRnet.

[377]  arXiv:2405.20407 (cross-list from physics.ins-det) [pdf, other]
Title: Convolutional L2LFlows: Generating Accurate Showers in Highly Granular Calorimeters Using Convolutional Normalizing Flows
Subjects: Instrumentation and Detectors (physics.ins-det); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); High Energy Physics - Phenomenology (hep-ph); Data Analysis, Statistics and Probability (physics.data-an)

In the quest to build generative surrogate models as computationally efficient alternatives to rule-based simulations, the quality of the generated samples remains a crucial frontier. So far, normalizing flows have been among the models with the best fidelity. However, as the latent space in such models is required to have the same dimensionality as the data space, scaling up normalizing flows to high dimensional datasets is not straightforward. The prior L2LFlows approach successfully used a series of separate normalizing flows and sequence of conditioning steps to circumvent this problem. In this work, we extend L2LFlows to simulate showers with a 9-times larger profile in the lateral direction. To achieve this, we introduce convolutional layers and U-Net-type connections, move from masked autoregressive flows to coupling layers, and demonstrate the successful modelling of showers in the ILD Electromagnetic Calorimeter as well as Dataset 3 from the public CaloChallenge dataset.

[378]  arXiv:2405.20447 (cross-list from stat.ML) [pdf, other]
Title: Algorithmic Fairness in Performative Policy Learning: Escaping the Impossibility of Group Fairness
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)

In many prediction problems, the predictive model affects the distribution of the prediction target. This phenomenon is known as performativity and is often caused by the behavior of individuals with vested interests in the outcome of the predictive model. Although performativity is generally problematic because it manifests as distribution shifts, we develop algorithmic fairness practices that leverage performativity to achieve stronger group fairness guarantees in social classification problems (compared to what is achievable in non-performative settings). In particular, we leverage the policymaker's ability to steer the population to remedy inequities in the long term. A crucial benefit of this approach is that it is possible to resolve the incompatibilities between conflicting group fairness definitions.

[379]  arXiv:2405.20451 (cross-list from stat.ML) [pdf, other]
Title: Statistical Properties of Robust Satisficing
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

The Robust Satisficing (RS) model is an emerging approach to robust optimization, offering streamlined procedures and robust generalization across various applications. However, the statistical theory of RS remains unexplored in the literature. This paper fills in the gap by comprehensively analyzing the theoretical properties of the RS model. Notably, the RS structure offers a more straightforward path to deriving statistical guarantees compared to the seminal Distributionally Robust Optimization (DRO), resulting in a richer set of results. In particular, we establish two-sided confidence intervals for the optimal loss without the need to solve a minimax optimization problem explicitly. We further provide finite-sample generalization error bounds for the RS optimizer. Importantly, our results extend to scenarios involving distribution shifts, where discrepancies exist between the sampling and target distributions. Our numerical experiments show that the RS model consistently outperforms the baseline empirical risk minimization in small-sample regimes and under distribution shifts. Furthermore, compared to the DRO model, the RS model exhibits lower sensitivity to hyperparameter tuning, highlighting its practicability for robustness considerations.

[380]  arXiv:2405.20500 (cross-list from math.OC) [pdf, other]
Title: Hybrid Reinforcement Learning Framework for Mixed-Variable Problems
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

Optimization problems characterized by both discrete and continuous variables are common across various disciplines, presenting unique challenges due to their complex solution landscapes and the difficulty of navigating mixed-variable spaces effectively. To Address these challenges, we introduce a hybrid Reinforcement Learning (RL) framework that synergizes RL for discrete variable selection with Bayesian Optimization for continuous variable adjustment. This framework stands out by its strategic integration of RL and continuous optimization techniques, enabling it to dynamically adapt to the problem's mixed-variable nature. By employing RL for exploring discrete decision spaces and Bayesian Optimization to refine continuous parameters, our approach not only demonstrates flexibility but also enhances optimization performance. Our experiments on synthetic functions and real-world machine learning hyperparameter tuning tasks reveal that our method consistently outperforms traditional RL, random search, and standalone Bayesian optimization in terms of effectiveness and efficiency.

[381]  arXiv:2405.20559 (cross-list from physics.optics) [pdf, other]
Title: Universal evaluation and design of imaging systems using information estimation
Subjects: Optics (physics.optics); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Image and Video Processing (eess.IV); Data Analysis, Statistics and Probability (physics.data-an)

Information theory, which describes the transmission of signals in the presence of noise, has enabled the development of reliable communication systems that underlie the modern world. Imaging systems can also be viewed as a form of communication, in which information about the object is "transmitted" through images. However, the application of information theory to imaging systems has been limited by the challenges of accounting for their physical constraints. Here, we introduce a framework that addresses these limitations by modeling the probabilistic relationship between objects and their measurements. Using this framework, we develop a method to estimate information using only a dataset of noisy measurements, without making any assumptions about the image formation process. We demonstrate that these estimates comprehensively quantify measurement quality across a diverse range of imaging systems and applications. Furthermore, we introduce Information-Driven Encoder Analysis Learning (IDEAL), a technique to optimize the design of imaging hardware for maximum information capture. This work provides new insights into the fundamental performance limits of imaging systems and offers powerful new tools for their analysis and design.

[382]  arXiv:2405.20591 (cross-list from q-bio.PE) [pdf, other]
Title: Weak-Form Inference for Hybrid Dynamical Systems in Ecology
Subjects: Populations and Evolution (q-bio.PE); Machine Learning (cs.LG); Dynamical Systems (math.DS)

Species subject to predation and environmental threats commonly exhibit variable periods of population boom and bust over long timescales. Understanding and predicting such behavior, especially given the inherent heterogeneity and stochasticity of exogenous driving factors over short timescales, is an ongoing challenge. A modeling paradigm gaining popularity in the ecological sciences for such multi-scale effects is to couple short-term continuous dynamics to long-term discrete updates. We develop a data-driven method utilizing weak-form equation learning to extract such hybrid governing equations for population dynamics and to estimate the requisite parameters using sparse intermittent measurements of the discrete and continuous variables. The method produces a set of short-term continuous dynamical system equations parametrized by long-term variables, and long-term discrete equations parametrized by short-term variables, allowing direct assessment of interdependencies between the two time scales. We demonstrate the utility of the method on a variety of ecological scenarios and provide extensive tests using models previously derived for epizootics experienced by the North American spongy moth (Lymantria dispar dispar).

[383]  arXiv:2405.20662 (cross-list from math.FA) [pdf, ps, other]
Title: Oscillations and differences in Besov-Morrey and Besov-type spaces
Comments: 45 pages. arXiv admin note: text overlap with arXiv:2306.15239
Subjects: Functional Analysis (math.FA); Numerical Analysis (math.NA)

In this paper we investigate Besov-Morrey spaces $\mathcal{N}^{s}_{u,p,q}(\Omega)$ and Besov-type spaces $B^{s,\tau}_{p,q}(\Omega)$ of positive smoothness defined on Lipschitz domains $\Omega \subset \mathbb{R}^d$ as well as on $\mathbb{R}^d$. We combine the Hedberg-Netrusov approach to function spaces with distinguished kernel representations due to Triebel, in order to derive novel characterizations of these scales in terms of local oscillations provided that some standard conditions concerning the parameters are fulfilled. In connection with that we also obtain new characterizations of $\mathcal{N}^{s}_{u,p,q}(\Omega)$ and $B^{s,\tau}_{p,q}(\Omega)$ via differences of higher order. By the way we recover and extend corresponding results for the scale of classical Besov spaces $B^{s}_{p,q}(\Omega)$.
Key words: Besov-Morrey space, Besov-type space, Morrey space, Lipschitz domain, oscillations, higher order differences

[384]  arXiv:2405.20668 (cross-list from q-bio.BM) [pdf, other]
Title: Improving Paratope and Epitope Prediction by Multi-Modal Contrastive Learning and Interaction Informativeness Estimation
Comments: This paper is accepted by IJCAI 2024
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Accurately predicting antibody-antigen binding residues, i.e., paratopes and epitopes, is crucial in antibody design. However, existing methods solely focus on uni-modal data (either sequence or structure), disregarding the complementary information present in multi-modal data, and most methods predict paratopes and epitopes separately, overlooking their specific spatial interactions. In this paper, we propose a novel Multi-modal contrastive learning and Interaction informativeness estimation-based method for Paratope and Epitope prediction, named MIPE, by using both sequence and structure data of antibodies and antigens. MIPE implements a multi-modal contrastive learning strategy, which maximizes representations of binding and non-binding residues within each modality and meanwhile aligns uni-modal representations towards effective modal representations. To exploit the spatial interaction information, MIPE also incorporates an interaction informativeness estimation that computes the estimated interaction matrices between antibodies and antigens, thereby approximating them to the actual ones. Extensive experiments demonstrate the superiority of our method compared to baselines. Additionally, the ablation studies and visualizations demonstrate the superiority of MIPE owing to the better representations acquired through multi-modal contrastive learning and the interaction patterns comprehended by the interaction informativeness estimation.

[385]  arXiv:2405.20693 (cross-list from eess.IV) [pdf, other]
Title: R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a previously unknown integration bias in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. Our new method presents three key innovations: (1) introducing tailored Gaussian kernels, (2) extending rasterization to X-ray imaging, and (3) developing a CUDA-based differentiable voxelizer. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches by 0.93 dB in PSNR and 0.014 in SSIM. Crucially, it delivers high-quality results in 3 minutes, which is 12x faster than NeRF-based methods and on par with traditional algorithms. The superior performance and rapid convergence of our method highlight its practical value.

[386]  arXiv:2405.20799 (cross-list from stat.ML) [pdf, other]
Title: Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures
Comments: Preprint. Under review. arXiv admin note: text overlap with arXiv:2403.10288
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Time-series data in real-world settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In these settings, traditional sequence-based recurrent models struggle. To overcome this, researchers often replace recurrent architectures with Neural ODE-based models to account for irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of even moderate length. To address this challenge, we introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences and incurs significantly lower computational costs. In particular, we propose \textit{multi-view signature attention}, which uses path signatures to augment vanilla attention and to capture both local and global (multi-scale) dependencies in the input data, while remaining robust to changes in the sequence length and sampling frequency and yielding improved spatial processing. We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the representational benefits of Neural ODE-based models, all at a fraction of the computational time and memory resources.

[387]  arXiv:2405.20825 (cross-list from physics.med-ph) [pdf, ps, other]
Title: Analysis of clinical, dosimetric and radiomic features for predicting local failure after stereotactic radiotherapy of brain metastases in malignant melanoma
Subjects: Medical Physics (physics.med-ph); Machine Learning (cs.LG)

Background: The aim of this study was to investigate the role of clinical, dosimetric and pretherapeutic magnetic resonance imaging (MRI) features for lesion-specific outcome prediction of stereotactic radiotherapy (SRT) in patients with brain metastases from malignant melanoma (MBM).
Methods: In this multicenter, retrospective analysis, we reviewed 517 MBM from 130 patients treated with SRT (single fraction or hypofractionated). For each gross tumor volume (GTV) 1576 radiomic features (RF) were calculated (788 each for the GTV and for a 3 mm margin around the GTV). Clinical parameters, radiation dose and RF from pretherapeutic contrast-enhanced T1-weighted MRI from different institutions were evaluated with a feature processing and elimination pipeline in a nested cross-validation scheme.
Results: Seventy-two (72) of 517 lesions (13.9%) showed a local failure (LF) after SRT. The processing pipeline showed clinical, dosimetric and radiomic features providing information for LF prediction. The most prominent ones were the correlation of the gray level co-occurrence matrix of the margin (hazard ratio (HR): 0.37, confidence interval (CI): 0.23-0.58) and systemic therapy before SRT (HR: 0.55, CI: 0.42-0.70). The majority of RF associated with LF was calculated in the margin around the GTV.
Conclusions: Pretherapeutic MRI based RF connected with lesion-specific outcome after SRT could be identified, despite multicentric data and minor differences in imaging protocols. Image data analysis of the surrounding metastatic environment may provide therapy-relevant information with the potential to further individualize radiotherapy strategies.

[388]  arXiv:2405.20863 (cross-list from q-bio.BM) [pdf, other]
Title: ABodyBuilder3: Improved and scalable antibody structure predictions
Comments: 8 pages, 3 figures, 3 tables, code available at this https URL, weights and data available at this https URL
Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)

Accurate prediction of antibody structure is a central task in the design and development of monoclonal antibodies, notably to understand both their developability and their binding properties. In this article, we introduce ABodyBuilder3, an improved and scalable antibody structure prediction model based on ImmuneBuilder. We achieve a new state-of-the-art accuracy in the modelling of CDR loops by leveraging language model embeddings, and show how predicted structures can be further improved through careful relaxation strategies. Finally, we incorporate a predicted Local Distance Difference Test into the model output to allow for a more accurate estimation of uncertainties.

[389]  arXiv:2405.20904 (cross-list from math.CO) [pdf, ps, other]
Title: Solving systems of equations on antichains for the computation of the ninth Dedekind Number
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

We study three systems of equations, together with a way to count the number of solutions. One of the results was used in the recent computation of D(9), the others have potential to speed up existing techniques in the future.

[390]  arXiv:2405.20910 (cross-list from physics.app-ph) [pdf, other]
Title: Predicting ptychography probe positions using single-shot phase retrieval neural network
Subjects: Applied Physics (physics.app-ph); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an)

Ptychography is a powerful imaging technique that is used in a variety of fields, including materials science, biology, and nanotechnology. However, the accuracy of the reconstructed ptychography image is highly dependent on the accuracy of the recorded probe positions which often contain errors. These errors are typically corrected jointly with phase retrieval through numerical optimization approaches. When the error accumulates along the scan path or when the error magnitude is large, these approaches may not converge with satisfactory result. We propose a fundamentally new approach for ptychography probe position prediction for data with large position errors, where a neural network is used to make single-shot phase retrieval on individual diffraction patterns, yielding the object image at each scan point. The pairwise offsets among these images are then found using a robust image registration method, and the results are combined to yield the complete scan path by constructing and solving a linear equation. We show that our method can achieve good position prediction accuracy for data with large and accumulating errors on the order of $10^2$ pixels, a magnitude that often makes optimization-based algorithms fail to converge. For ptychography instruments without sophisticated position control equipment such as interferometers, our method is of significant practical potential.

[391]  arXiv:2405.20970 (cross-list from stat.ML) [pdf, other]
Title: PUAL: A Classifier on Trifurcate Positive-Unlabeled Data
Comments: 24 pages, 6 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Positive-unlabeled (PU) learning aims to train a classifier using the data containing only labeled-positive instances and unlabeled instances. However, existing PU learning methods are generally hard to achieve satisfactory performance on trifurcate data, where the positive instances distribute on both sides of the negative instances. To address this issue, firstly we propose a PU classifier with asymmetric loss (PUAL), by introducing a structure of asymmetric loss on positive instances into the objective function of the global and local learning classifier. Then we develop a kernel-based algorithm to enable PUAL to obtain non-linear decision boundary. We show that, through experiments on both simulated and real-world datasets, PUAL can achieve satisfactory classification on trifurcate data.

[392]  arXiv:2405.20999 (cross-list from math.DS) [pdf, other]
Title: Towards a Fluid computer
Comments: 11 pages, 3 figures
Subjects: Dynamical Systems (math.DS); Computation and Language (cs.CL); Analysis of PDEs (math.AP); Symplectic Geometry (math.SG)

In 1991, Moore [20] raised a question about whether hydrodynamics is capable of performing computations. Similarly, in 2016, Tao [25] asked whether a mechanical system, including a fluid flow, can simulate a universal Turing machine. In this expository article, we review the construction in [8] of a "Fluid computer" in dimension 3 that combines techniques in symbolic dynamics with the connection between steady Euler flows and contact geometry unveiled by Etnyre and Ghrist. In addition, we argue that the metric that renders the vector field Beltrami cannot be critical in the Chern-Hamilton sense [9]. We also sketch the completely different construction for the Euclidean metric in $\mathbb R^3$ as given in [7]. These results reveal the existence of undecidable fluid particle paths. We conclude the article with a list of open problems.

[393]  arXiv:2405.21023 (cross-list from math.OC) [pdf, other]
Title: Compact Optimality Verification for Optimization Proxies
Comments: International Conference on Machine Learning 2024
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)

Recent years have witnessed increasing interest in optimization proxies, i.e., machine learning models that approximate the input-output mapping of parametric optimization problems and return near-optimal feasible solutions. Following recent work by (Nellikkath & Chatzivasileiadis, 2021), this paper reconsiders the optimality verification problem for optimization proxies, i.e., the determination of the worst-case optimality gap over the instance distribution. The paper proposes a compact formulation for optimality verification and a gradient-based primal heuristic that brings substantial computational benefits to the original formulation. The compact formulation is also more general and applies to non-convex optimization problems. The benefits of the compact formulation are demonstrated on large-scale DC Optimal Power Flow and knapsack problems.

Replacements for Mon, 3 Jun 24

[394]  arXiv:1709.04044 (replaced) [pdf, ps, other]
Title: Spectral ACMS: A robust localized Approximated Component Mode Synthesis Method
Subjects: Numerical Analysis (math.NA)
[395]  arXiv:2002.01605 (replaced) [pdf, ps, other]
Title: Exploratory Machine Learning with Unknown Unknowns
Comments: published at Artificial Intelligence, preliminary conference version published at AAAI'21
Journal-ref: Artificial Intelligence,Volume 327, 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[396]  arXiv:2101.06998 (replaced) [pdf, ps, other]
Title: An FPT algorithm for Matching Cut and d-cut
Subjects: Data Structures and Algorithms (cs.DS)
[397]  arXiv:2103.03636 (replaced) [pdf, other]
Title: CoDeGAN: Contrastive Disentanglement for Generative Adversarial Network
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[398]  arXiv:2103.05621 (replaced) [pdf, other]
Title: The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression
Subjects: Machine Learning (cs.LG)
[399]  arXiv:2107.07726 (replaced) [pdf, other]
Title: Double Glueing over Free Exponential: with Measure Theoretic Applications
Authors: Masahiro Hamano
Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT)
[400]  arXiv:2110.09197 (replaced) [pdf, ps, other]
Title: On the Completeness and Complexity of the Lifted Dynamic Junction Tree Algorithm
Authors: Marcel Gehrke
Comments: StaRAI 2021
Subjects: Artificial Intelligence (cs.AI)
[401]  arXiv:2110.10927 (replaced) [pdf, other]
Title: SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[402]  arXiv:2111.14302 (replaced) [pdf, other]
Title: Self-supervised Feature-Gate Coupling for Dynamic Network Pruning
Comments: Pattern Recognition, Preprint
Journal-ref: Pattern Recognition, Volume 154, 2024, 110594.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[403]  arXiv:2203.13655 (replaced) [pdf, other]
Title: Gransformer: Transformer-based Graph Generation
Subjects: Machine Learning (cs.LG)
[404]  arXiv:2204.09140 (replaced) [pdf, other]
Title: Multi-hop Question Answering
Authors: Vaibhav Mavi (New York University, United States of America), Anubhav Jangra (Indian Institute of Technology, Patna, India), Adam Jatowt (University of Innsbruck, Austria)
Comments: Published at Foundations and Trends in Information Retrieval
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[405]  arXiv:2205.00825 (replaced) [pdf, other]
Title: Stochastic Online Fisher Markets: Static Pricing Limits and Adaptive Enhancements
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
[406]  arXiv:2205.11518 (replaced) [pdf, other]
Title: LIA: Privacy-Preserving Data Quality Evaluation in Federated Learning Using a Lazy Influence Approximation
Comments: A preliminary version of this work received the Best Paper Award at the International Workshop on Trustworthy Federated Learning at IJCAI (FL-IJCAI) 2023
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[407]  arXiv:2207.11860 (replaced) [pdf, other]
Title: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation
Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Extended version of CVPR 2022 paper arXiv:2203.01452. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
[408]  arXiv:2209.08757 (replaced) [pdf, ps, other]
Title: Parameterized Complexity of Path Set Packing
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
[409]  arXiv:2211.04634 (replaced) [pdf, other]
Title: Learning Optimal Graph Filters for Clustering of Attributed Graphs
Comments: 12 pages, 7 figures
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Signal Processing (eess.SP)
[410]  arXiv:2211.06166 (replaced) [pdf, other]
Title: Mathematical Modelling of Neuroblast Chemotaxis Migration towards the Olfactory Bulb
Subjects: Numerical Analysis (math.NA)
[411]  arXiv:2211.10636 (replaced) [pdf, other]
Title: EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Comments: Accepted by ICML 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[412]  arXiv:2211.10737 (replaced) [pdf, other]
Title: Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training
Subjects: Machine Learning (cs.LG)
[413]  arXiv:2211.13862 (replaced) [pdf, ps, other]
Title: Generalized Convolution Quadrature for non smooth sectorial problems
Comments: 30 pages, 22 figures
Subjects: Numerical Analysis (math.NA)
[414]  arXiv:2211.16168 (replaced) [pdf, other]
Title: Robust boundary integral equations for the solution of elastic scattering problems via Helmholtz decompositions
Authors: V. Dominguez, C. Turc
Comments: 36 pages, 10 figures
Subjects: Numerical Analysis (math.NA)
[415]  arXiv:2212.00394 (replaced) [pdf, other]
Title: From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[416]  arXiv:2212.07946 (replaced) [pdf, other]
Title: Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability
Comments: 90 pages including appendices
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[417]  arXiv:2212.11580 (replaced) [pdf, ps, other]
Title: A Theory of Conversion Relations for Prefixed Units of Measure
Subjects: Programming Languages (cs.PL); Discrete Mathematics (cs.DM)
[418]  arXiv:2212.14041 (replaced) [pdf, other]
Title: Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective
Comments: Accepted by ICML 2024
Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[419]  arXiv:2301.06650 (replaced) [pdf, other]
Title: Enhancing Deep Traffic Forecasting Models with Dynamic Regression
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[420]  arXiv:2301.11074 (replaced) [pdf, ps, other]
Title: Digital Inheritance in Web3: A Case Study of Soulbound Tokens and the Social Recovery Pallet within the Polkadot and Kusama Ecosystems
Subjects: Cryptography and Security (cs.CR)
[421]  arXiv:2301.13734 (replaced) [pdf, other]
Title: Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
Subjects: Machine Learning (cs.LG)
[422]  arXiv:2303.00728 (replaced) [pdf, other]
Title: On the universality of $S_n$-equivariant $k$-body gates
Comments: 7+15 pages, 3+5 figures, updated to published version
Journal-ref: New J. Phys. 26, 053030 (2024)
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)
[423]  arXiv:2303.06689 (replaced) [pdf, other]
Title: Self-planning Code Generation with Large Language Models
Comments: Accepted by TOSEM
Subjects: Software Engineering (cs.SE)
[424]  arXiv:2304.01716 (replaced) [pdf, other]
Title: Decoupling Dynamic Monocular Videos for Dynamic View Synthesis
Authors: Meng You, Junhui Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[425]  arXiv:2304.07248 (replaced) [pdf, ps, other]
Title: The University of California San Francisco Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) MRI Dataset
Comments: 15 pages, 2 tables, 2 figures
Journal-ref: Radiology: Artificial Intelligence. 2024;6(2):e230126
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[426]  arXiv:2304.10101 (replaced) [pdf, other]
Title: Federated Compositional Deep AUC Maximization
Subjects: Machine Learning (cs.LG)
[427]  arXiv:2305.00664 (replaced) [pdf, other]
Title: EvoluNet: Advancing Dynamic Non-IID Transfer Learning on Graphs
Comments: Accepted at ICML 2024
Subjects: Machine Learning (cs.LG)
[428]  arXiv:2305.01461 (replaced) [pdf, other]
Title: Mixed-Integer Optimal Control via Reinforcement Learning: A Case Study on Hybrid Electric Vehicle Energy Management
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
[429]  arXiv:2305.02736 (replaced) [pdf, other]
Title: Separability and Non-Determinizability of WSTS
Comments: This paper is the journal version of the CONCUR23 paper "Separability and Non-Determinizability of WSTS'', which can be found in the previous version of this arxiv document. It covers both papers about regular separability in WSTS: the CONCUR23 paper and its predecessor the CONCUR18 paper. As this version does not contain an appendix, please refer to the previous version for missing proofs
Subjects: Formal Languages and Automata Theory (cs.FL)
[430]  arXiv:2305.07845 (replaced) [pdf, other]
Title: Understanding and Improving Model Averaging in Federated Learning on Heterogeneous Data
Comments: To appear in IEEE Transactions on Mobile Computing. Code is available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[431]  arXiv:2305.08116 (replaced) [pdf, ps, other]
Title: The Structure and Dynamics of Knowledge Graphs, with Superficiality
Subjects: Artificial Intelligence (cs.AI)
[432]  arXiv:2305.09938 (replaced) [pdf, other]
Title: Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization
Comments: Accepted at KDD 2024
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[433]  arXiv:2305.10225 (replaced) [pdf, other]
Title: New and improved bounds on the contextuality degree of multi-qubit configurations
Comments: 22 pages, 5 figures, 2 tables, published by Cambridge University Press in Mathematical Structures in Computer Science
Journal-ref: Mathematical Structures in Computer Science. Published online 2024:1-22
Subjects: Quantum Physics (quant-ph); Discrete Mathematics (cs.DM); Symplectic Geometry (math.SG)
[434]  arXiv:2305.15255 (replaced) [pdf, other]
Title: Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Comments: ICLR 2024 camera-ready
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[435]  arXiv:2305.15805 (replaced) [pdf, other]
Title: Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[436]  arXiv:2306.02617 (replaced) [pdf, other]
Title: Permutation Decision Trees
Comments: 15 pages, 8 figures
Subjects: Machine Learning (cs.LG)
[437]  arXiv:2306.03322 (replaced) [pdf, other]
Title: Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate
Authors: Hongchang Gao
Subjects: Machine Learning (cs.LG)
[438]  arXiv:2306.04325 (replaced) [pdf, other]
Title: A Perspective Study on Chinese Social Media regarding LLM for Education and Beyond
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
[439]  arXiv:2306.05501 (replaced) [pdf, other]
Title: Robust Explainer Recommendation for Time Series Classification
Comments: Accepted for publication in Data Mining and Knowledge Discovery
Subjects: Machine Learning (cs.LG)
[440]  arXiv:2306.07856 (replaced) [pdf, other]
Title: Bayesian Program Learning by Decompiling Amortized Knowledge
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
[441]  arXiv:2306.08595 (replaced) [pdf, other]
Title: TensorKrowch: Smooth integration of tensor networks in machine learning
Comments: 20 pages, 2 figures. The TensorKrowch GitHub repository is in this https URL and the TensorKrowch documentation is in this https URL V3: Accepted version, corrected acknowledgments
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Strongly Correlated Electrons (cond-mat.str-el); Quantum Physics (quant-ph)
[442]  arXiv:2306.08970 (replaced) [pdf, other]
Title: An Efficient and Multi-private Key Secure Aggregation for Federated Learning
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[443]  arXiv:2306.10843 (replaced) [pdf, other]
Title: Female mosquito detection by means of AI techniques inside release containers in the context of a Sterile Insect Technique program
Comments: Accepted EUSIPCO 2024
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[444]  arXiv:2307.08187 (replaced) [pdf, other]
Title: An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[445]  arXiv:2307.16128 (replaced) [pdf, other]
Title: Online Interior-point Methods for Time-varying Equality-constrained Optimization
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[446]  arXiv:2307.16565 (replaced) [pdf, other]
Title: Towards Imbalanced Motion: Part-Decoupling Network for Video Portrait Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[447]  arXiv:2308.01399 (replaced) [pdf, other]
Title: Learning to Model the World with Language
Comments: ICML 2024. Website: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[448]  arXiv:2308.09310 (replaced) [pdf, other]
Title: Variance reduction techniques for stochastic proximal point algorithms
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[449]  arXiv:2308.12985 (replaced) [pdf, ps, other]
Title: Perimeter Control with Heterogeneous Metering Rates for Cordon Signals: A Physics-Regularized Multi-Agent Reinforcement Learning Approach
Comments: 21 pages, 24 figures
Subjects: Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[450]  arXiv:2308.14143 (replaced) [pdf, other]
Title: Ensemble-localized Kernel Density Estimation with Applications to the Ensemble Gaussian Mixture Filter
Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA); Applications (stat.AP)
[451]  arXiv:2308.14906 (replaced) [pdf, other]
Title: BayOTIDE: Bayesian Online Multivariate Time series Imputation with functional decomposition
Comments: Accepted by The 41st International Conference on Machine Learning (ICML 2024)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[452]  arXiv:2309.02691 (replaced) [pdf, other]
Title: A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Comments: This was published in TMLR in 2024, on January 24th
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[453]  arXiv:2309.05660 (replaced) [pdf, other]
Title: Hypothesis Search: Inductive Reasoning with Language Models
Comments: ICLR 2024. The first two authors contributed equally. Code: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[454]  arXiv:2309.07709 (replaced) [pdf, other]
Title: Safe Aerial Manipulator Maneuvering and Force Exertion via Control Barrier Functions
Subjects: Robotics (cs.RO)
[455]  arXiv:2309.14512 (replaced) [pdf, ps, other]
Title: Byzantine-Resilient Federated PCA and Low Rank Column-wise Sensing
Comments: 36 pages
Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)
[456]  arXiv:2309.16476 (replaced) [pdf, other]
Title: High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality
Comments: 13 pages + Supplementary information
Subjects: Statistics Theory (math.ST); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Machine Learning (stat.ML)
[457]  arXiv:2310.00154 (replaced) [pdf, other]
Title: Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[458]  arXiv:2310.00835 (replaced) [pdf, other]
Title: TRAM: Benchmarking Temporal Reasoning for Large Language Models
Authors: Yuqing Wang, Yun Zhao
Comments: Findings of ACL 2024
Subjects: Computation and Language (cs.CL)
[459]  arXiv:2310.01765 (replaced) [pdf, other]
Title: Data Cleaning and Machine Learning: A Systematic Literature Review
Comments: Published in the Automated Software Engineering Journal
Subjects: Machine Learning (cs.LG); Databases (cs.DB)
[460]  arXiv:2310.02223 (replaced) [pdf, ps, other]
Title: Query-Based Sampling of Heterogeneous CTMCs: Modeling and Optimization with Binary Freshness
Comments: 10 pages, 6 figures
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Performance (cs.PF)
[461]  arXiv:2310.02905 (replaced) [pdf, other]
Title: Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers
Comments: Accepted to ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[462]  arXiv:2310.05764 (replaced) [pdf, other]
Title: Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design
Comments: Published at ICML 2024. (Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[463]  arXiv:2310.07557 (replaced) [pdf, other]
Title: Quality of Service-Constrained Online Routing in High Throughput Satellites
Comments: Added constraints and updated numerical results. Layout improvement
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[464]  arXiv:2310.11548 (replaced) [pdf, other]
Title: Differentially Private Data Generation with Missing Data
Comments: 18 pages, 9 figures, 2 tables
Journal-ref: PVLDB Volume 17, 2024
Subjects: Databases (cs.DB); Cryptography and Security (cs.CR)
[465]  arXiv:2310.13397 (replaced) [pdf, other]
Title: Equivariant Deep Weight Space Alignment
Comments: ICML 2024
Subjects: Machine Learning (cs.LG)
[466]  arXiv:2310.18339 (replaced) [pdf, other]
Title: When MOE Meets LLMs: Parameter Efficient Fine-tuning for Multi-task Medical Applications
Comments: accepted by SIGIR'24
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[467]  arXiv:2310.18953 (replaced) [pdf, other]
Title: TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression
Comments: ICML 2024. Please feel free to provide feedback!
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[468]  arXiv:2311.00177 (replaced) [pdf, other]
Title: Students' Perspective on AI Code Completion: Benefits and Challenges
Comments: Accepted at COMPSAC 2024 Workshop (The 7th IEEE International Workshop on Advances in Artificial Intelligence and Machine Learning: AI & ML for a Sustainable and Better Future)
Subjects: Software Engineering (cs.SE)
[469]  arXiv:2311.01479 (replaced) [pdf, other]
Title: Detecting Out-of-Distribution Through the Lens of Neural Collapse
Authors: Litian Liu, Yao Qin
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[470]  arXiv:2311.01906 (replaced) [pdf, other]
Title: Simplifying Transformer Blocks
Comments: ICLR 2024
Subjects: Machine Learning (cs.LG)
[471]  arXiv:2311.03732 (replaced) [pdf, other]
Title: Learning to Learn for Few-shot Continual Active Learning
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[472]  arXiv:2311.06958 (replaced) [pdf, other]
Title: Towards Climate Variable Prediction with Conditioned Spatio-Temporal Normalizing Flows
Comments: 5 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[473]  arXiv:2311.06983 (replaced) [pdf, other]
Title: A Different View of Sigma-Delta Modulators Under the Lens of Pulse Frequency Modulation
Authors: Victor Medina (1), Pieter Rombouts (2), Luis Hernandez (1) ((1) Carlos III University, Madrid, Spain. (2) Ghent University, Belgium.)
Comments: 15 pages, 28 figures
Subjects: Systems and Control (eess.SY)
[474]  arXiv:2311.09510 (replaced) [pdf, other]
Title: Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization
Comments: Camera ready version accepted to Findings of ACL 2024
Subjects: Computation and Language (cs.CL)
[475]  arXiv:2311.10879 (replaced) [pdf, other]
Title: Pre- to Post-Contrast Breast MRI Synthesis for Enhanced Tumour Segmentation
Comments: Accepted as oral presentation at SPIE Medical Imaging 2024 (Image Processing)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[476]  arXiv:2311.11745 (replaced) [pdf, other]
Title: ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Comments: ICML 2024
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[477]  arXiv:2311.12356 (replaced) [pdf, other]
Title: Random Linear Projections Loss for Hyperplane-Based Optimization in Neural Networks
Subjects: Machine Learning (cs.LG)
[478]  arXiv:2311.15356 (replaced) [pdf, other]
Title: Having Second Thoughts? Let's hear it
Comments: 10 pages, 6 figures, 3 table and Append/Supplementary materials. Section 3 has been substantially revised
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[479]  arXiv:2311.17138 (replaced) [pdf, other]
Title: Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Comments: Project Page: this https URL | First three authors contributed equally
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
[480]  arXiv:2311.17389 (replaced) [pdf, other]
Title: 360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Comments: CVPR 2024. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[481]  arXiv:2311.18189 (replaced) [pdf, other]
Title: Event-based Visual Inertial Velometer
Subjects: Robotics (cs.RO)
[482]  arXiv:2312.00110 (replaced) [pdf, other]
Title: CLIP-QDA: An Explainable Concept Bottleneck Model
Journal-ref: Transactions on Machine Learning Research (05/2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[483]  arXiv:2312.00752 (replaced) [pdf, other]
Title: Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Authors: Albert Gu, Tri Dao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[484]  arXiv:2312.04234 (replaced) [pdf, other]
Title: Graph Convolutions Enrich the Self-Attention in Transformers!
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[485]  arXiv:2312.06717 (replaced) [pdf, other]
Title: Privacy Issues in Large Language Models: A Survey
Comments: May 2024 update
Subjects: Artificial Intelligence (cs.AI)
[486]  arXiv:2312.08291 (replaced) [pdf, other]
Title: VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[487]  arXiv:2312.09085 (replaced) [pdf, other]
Title: The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Comments: Accepted to ACL'24 (Main). Camera-ready version
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY)
[488]  arXiv:2312.10045 (replaced) [pdf, other]
Title: Interpretable Knowledge Tracing via Response Influence-based Counterfactual Reasoning
Comments: ICDE'24 (fixing a few typos). Source code at this https URL Keywords: knowledge tracing, interpretable machine learning, counterfactual reasoning, artificial intelligence for education
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[489]  arXiv:2312.13415 (replaced) [pdf, other]
Title: Higher-Order Staircase Codes
Comments: Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
[490]  arXiv:2401.00368 (replaced) [pdf, other]
Title: Improving Text Embeddings with Large Language Models
Comments: Accepted by ACL 2024
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[491]  arXiv:2401.00766 (replaced) [pdf, other]
Title: Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks
Comments: 21 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[492]  arXiv:2401.01681 (replaced) [pdf, other]
Title: Hamiltonicity of Schrijver graphs and stable Kneser graphs
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
[493]  arXiv:2401.03922 (replaced) [pdf, ps, other]
Title: SNeurodCNN: Structure-focused Neurodegeneration Convolutional Neural Network for Modelling and Classification of Alzheimer's Disease
Comments: 36 Pages, 10 figures, 4 tables
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[494]  arXiv:2401.04928 (replaced) [pdf, other]
Title: Relaxed Contrastive Learning for Federated Learning
Subjects: Machine Learning (cs.LG)
[495]  arXiv:2401.11052 (replaced) [pdf, other]
Title: Mining experimental data from Materials Science literature with Large Language Models: an evaluation study
Comments: 40 pages: 5 figures and 1 table in the body. 32 Tables in the Appendix / Supplementary materials
Journal-ref: Science and Technology of Advanced Materials: Methods (2024)
Subjects: Computation and Language (cs.CL)
[496]  arXiv:2401.11053 (replaced) [pdf, other]
Title: StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[497]  arXiv:2401.11130 (replaced) [pdf, ps, other]
Title: Identification and Estimation of Conditional Average Partial Causal Effects via Instrumental Variable
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[498]  arXiv:2401.12459 (replaced) [pdf, other]
Title: Towards Socially and Morally Aware RL agent: Reward Design With LLM
Authors: Zhaoyue Wang
Subjects: Artificial Intelligence (cs.AI)
[499]  arXiv:2401.15713 (replaced) [pdf, other]
Title: Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[500]  arXiv:2401.15854 (replaced) [pdf, other]
Title: LSTM-based Deep Neural Network With A Focus on Sentence Representation for Sequential Sentence Classification in Medical Scientific Abstracts
Comments: Submitted to FedCSIS 2024
Subjects: Computation and Language (cs.CL)
[501]  arXiv:2401.16969 (replaced) [pdf, ps, other]
Title: Taxonomy of Mathematical Plagiarism
Comments: 46th European Conference on Information Retrieval (ECIR)
Subjects: Information Retrieval (cs.IR)
[502]  arXiv:2401.17045 (replaced) [pdf, ps, other]
Title: Explaining Explanations in Probabilistic Logic Programming
Authors: Germán Vidal
Comments: Submitted for publication
Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
[503]  arXiv:2402.00325 (replaced) [pdf, ps, other]
Title: Using digital twins for managing change in complex projects
Comments: 11 pages, 5 figures
Subjects: Systems and Control (eess.SY)
[504]  arXiv:2402.01000 (replaced) [pdf, other]
Title: Multivariate Probabilistic Time Series Forecasting with Correlated Errors
Comments: This paper extends the work presented in arXiv:2305.17028 to a multivariate setting
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[505]  arXiv:2402.01335 (replaced) [pdf, other]
Title: Simulator-Free Visual Domain Randomization via Video Games
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[506]  arXiv:2402.02216 (replaced) [pdf, other]
Title: Position: Graph Foundation Models are Already Here
Comments: 23 pages, 2 figures
Subjects: Machine Learning (cs.LG)
[507]  arXiv:2402.03299 (replaced) [pdf, other]
Title: GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Comments: 28 papges
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[508]  arXiv:2402.03962 (replaced) [pdf, other]
Title: Position: Stop Making Unscientific AGI Performance Claims
Comments: 21 pages, 15 figures. Pre-print to be published at International Conference on Machine Learning (ICML) 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[509]  arXiv:2402.04513 (replaced) [pdf, other]
Title: Online Cascade Learning for Efficient Inference over Streams
Comments: ICML 2024 Main Conference Paper
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[510]  arXiv:2402.05841 (replaced) [pdf, other]
Title: Dirichlet Flow Matching with Applications to DNA Sequence Design
Comments: Published at ICML 2024. (Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024)
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
[511]  arXiv:2402.05861 (replaced) [pdf, other]
Title: Memory Consolidation Enables Long-Context Video Understanding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[512]  arXiv:2402.06497 (replaced) [pdf, other]
Title: Iris-SAM: Iris Segmentation Using a Foundation Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[513]  arXiv:2402.07043 (replaced) [pdf, other]
Title: A Tale of Tails: Model Collapse as a Change of Scaling Laws
Journal-ref: ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[514]  arXiv:2402.07131 (replaced) [pdf, other]
Title: Resampling methods for Private Statistical Inference
Comments: 45 pages
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Methodology (stat.ME)
[515]  arXiv:2402.08097 (replaced) [pdf, ps, other]
Title: An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[516]  arXiv:2402.08638 (replaced) [pdf, other]
[517]  arXiv:2402.09050 (replaced) [pdf, other]
Title: End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training
Comments: TMLR2024
Subjects: Machine Learning (cs.LG)
[518]  arXiv:2402.09615 (replaced) [pdf, other]
Title: API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[519]  arXiv:2402.09723 (replaced) [pdf, other]
Title: Efficient Prompt Optimization Through the Lens of Best Arm Identification
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[520]  arXiv:2402.09838 (replaced) [pdf, other]
Title: Performative Reinforcement Learning in Gradually Shifting Environments
Subjects: Machine Learning (cs.LG)
[521]  arXiv:2402.09894 (replaced) [pdf, other]
Title: Not Just Novelty: A Longitudinal Study on Utility and Customization of an AI Workflow
Comments: 22 pages, 16 figures. ACM Conference on Designing Interactive Systems (DIS 2024)
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
[522]  arXiv:2402.10705 (replaced) [pdf, other]
Title: AutoSAT: Automatically Optimize SAT Solvers via Large Language Models
Subjects: Artificial Intelligence (cs.AI)
[523]  arXiv:2402.11058 (replaced) [pdf, other]
Title: II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
Comments: Accepted to ACL 2024 Findings
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[524]  arXiv:2402.11438 (replaced) [pdf, other]
Title: The Road to Trust: Building Enclaves within Confidential VMs
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
[525]  arXiv:2402.11450 (replaced) [pdf, other]
[526]  arXiv:2402.12146 (replaced) [pdf, other]
Title: Enabling Weak LLMs to Judge Response Reliability via Meta Ranking
Comments: Preprint, under review. 28 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[527]  arXiv:2402.12550 (replaced) [pdf, other]
Title: Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
Comments: Github: this https URL Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[528]  arXiv:2402.13901 (replaced) [pdf, other]
Title: Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[529]  arXiv:2402.14991 (replaced) [pdf, other]
Title: Quantum Theory and Application of Contextual Optimal Transport
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Quantum Algebra (math.QA); Quantitative Methods (q-bio.QM); Quantum Physics (quant-ph)
[530]  arXiv:2402.15259 (replaced) [pdf, other]
Title: Open Ad Hoc Teamwork with Cooperative Game Theory
Comments: Published at ICML 2024, 29 pages
Subjects: Multiagent Systems (cs.MA); Machine Learning (cs.LG)
[531]  arXiv:2402.15938 (replaced) [pdf, other]
Title: Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models
Comments: Accepted to ACL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Software Engineering (cs.SE)
[532]  arXiv:2402.16714 (replaced) [pdf, other]
Title: Quantum linear algebra is all you need for Transformer architectures
Comments: 31 pages, 4 figures, 2 tables, comments are welcome
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[533]  arXiv:2402.16785 (replaced) [pdf, other]
Title: CARTE: Pretraining and Transfer for Tabular Learning
Subjects: Machine Learning (cs.LG)
[534]  arXiv:2402.17502 (replaced) [pdf, other]
Title: FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation
Comments: 12 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[535]  arXiv:2402.17509 (replaced) [pdf, other]
Title: Extreme Miscalibration and the Illusion of Adversarial Robustness
Subjects: Computation and Language (cs.CL)
[536]  arXiv:2402.17810 (replaced) [pdf, other]
Title: BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning
Comments: Accepted by ACL 2024 (Findings)
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
[537]  arXiv:2402.19473 (replaced) [pdf, other]
Title: Retrieval-Augmented Generation for AI-Generated Content: A Survey
Comments: Citing 334 papers, 21 pages, 1 table, 12 figures. Project: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[538]  arXiv:2403.00282 (replaced) [pdf, other]
Title: Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Comments: 25 pages
Subjects: Machine Learning (cs.LG)
[539]  arXiv:2403.01289 (replaced) [pdf, other]
Title: Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Comments: ACL 2024 (main)
Subjects: Computation and Language (cs.CL)
[540]  arXiv:2403.01371 (replaced) [pdf, other]
Title: eXponential FAmily Dynamical Systems (XFADS): Large-scale nonlinear Gaussian state-space modeling
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[541]  arXiv:2403.03031 (replaced) [pdf, other]
Title: Learning to Use Tools via Cooperative and Interactive Agents
Comments: working in process, 20 pages
Subjects: Computation and Language (cs.CL)
[542]  arXiv:2403.03938 (replaced) [pdf, other]
Title: GUIDE: Guidance-based Incremental Learning with Diffusion Models
Subjects: Machine Learning (cs.LG)
[543]  arXiv:2403.04626 (replaced) [pdf, other]
Title: MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder
Subjects: Image and Video Processing (eess.IV); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[544]  arXiv:2403.05300 (replaced) [pdf, other]
Title: Unity by Diversity: Improved Representation Learning in Multimodal VAEs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[545]  arXiv:2403.05826 (replaced) [pdf, other]
Title: Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[546]  arXiv:2403.09930 (replaced) [pdf, other]
Title: Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics
Comments: The first two authors contributed equally to this work
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[547]  arXiv:2403.10144 (replaced) [pdf, other]
Title: NLP Verification: Towards a General Methodology for Certifying Robustness
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
[548]  arXiv:2403.11067 (replaced) [pdf, other]
Title: Signal Fidelity in Degenerate and Nondegenerate Mode Parametric Amplifier Receiving Antennas
Comments: 5 pages, 6 figures. Submitted to IEEE Antennas and Wireless Propagation Letters March 15, 2024; revised May 30, 2024
Subjects: Systems and Control (eess.SY)
[549]  arXiv:2403.11353 (replaced) [pdf, other]
Title: AI-enabled prediction of NMR spectroscopy: Deducing 2-D NMR of carbohydrate
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Chemical Physics (physics.chem-ph)
[550]  arXiv:2403.12166 (replaced) [pdf, other]
Title: The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection
Comments: Accepted to ICASSP 2024
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[551]  arXiv:2403.12995 (replaced) [pdf, other]
Title: ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling
Comments: ICML2024 camera-ready, update some experimental results, add github url
Subjects: Biomolecules (q-bio.BM); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[552]  arXiv:2403.14770 (replaced) [pdf, other]
Title: Beehive: A Flexible Network Stack for Direct-Attached Accelerators
Subjects: Hardware Architecture (cs.AR)
[553]  arXiv:2403.16290 (replaced) [pdf, other]
Title: An Information Theory Treatment of Animal Movement Tracks
Authors: Wayne M Getz
Comments: 21 pages, 2 tables, 1 figure
Subjects: Populations and Evolution (q-bio.PE); Information Theory (cs.IT)
[554]  arXiv:2403.16539 (replaced) [pdf, other]
Title: Data-Efficient 3D Visual Grounding via Order-Aware Referring
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[555]  arXiv:2403.18980 (replaced) [pdf, other]
Title: A census of graph-drawing algorithms based on generalized transversal structures
Subjects: Combinatorics (math.CO); Computational Geometry (cs.CG)
[556]  arXiv:2403.20216 (replaced) [pdf, ps, other]
Title: Distributed agency in second language learning and teaching through generative AI
Comments: 26 pages. Published in Language Learning & Technology, volume 28, issue 2, pp. 5-31: this http URL
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[557]  arXiv:2404.01489 (replaced) [pdf, other]
Title: Perceived Social Influence on Vaccination Decisions: A COVID-19 Case Study
Comments: Preprint of paper currently under review
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Physics and Society (physics.soc-ph)
[558]  arXiv:2404.01803 (replaced) [pdf, ps, other]
Title: Systematic Solutions to Login and Authentication Security Problems: A Dual-Password Login-Authentication Mechanism
Authors: Suyun Borjigin
Comments: 11 pages, 3 figures, 28 conferences
Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET); Systems and Control (eess.SY)
[559]  arXiv:2404.03197 (replaced) [pdf, other]
Title: A Rolling Horizon Restoration Framework for Post-disaster Restoration of Electrical Distribution Networks
Comments: 26 pages, 17 figures
Subjects: Systems and Control (eess.SY)
[560]  arXiv:2404.04240 (replaced) [pdf, other]
Title: Dynamic Conditional Optimal Transport through Simulation-Free Flows
Subjects: Machine Learning (cs.LG)
[561]  arXiv:2404.07217 (replaced) [pdf, other]
Title: Attention-aware Semantic Communications for Collaborative Inference
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[562]  arXiv:2404.07402 (replaced) [pdf, other]
Title: An excursion onto Schrödinger's bridges: Stochastic flows with spatio-temporal marginals
Comments: 6 pages, 2 figures
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Probability (math.PR)
[563]  arXiv:2404.07611 (replaced) [pdf, other]
Title: NoticIA: A Clickbait Article Summarization Dataset in Spanish
Comments: Accepted in the journal Procesamiento del Lenguaje Natural
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[564]  arXiv:2404.07989 (replaced) [pdf, other]
Title: Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
Comments: Code and models are released at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[565]  arXiv:2404.08846 (replaced) [pdf, other]
Title: Experimental Design for Active Transductive Inference in Large Language Models
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[566]  arXiv:2404.09636 (replaced) [pdf, other]
Title: All-in-one simulation-based inference
Comments: To be published in the proceedings of the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria. PMLR 235, 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[567]  arXiv:2404.10445 (replaced) [pdf, other]
Title: SparseDM: Toward Sparse Efficient Diffusion Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[568]  arXiv:2404.10854 (replaced) [pdf, other]
Title: Methods to Estimate Cryptic Sequence Complexity
Subjects: Populations and Evolution (q-bio.PE); Neural and Evolutionary Computing (cs.NE)
[569]  arXiv:2404.11265 (replaced) [pdf, other]
Title: The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data
Comments: 13 pages, 6 figures, published to ICCV
Journal-ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 155-164
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[570]  arXiv:2404.13337 (replaced) [pdf, other]
Title: Fuzzychain: An Equitable Consensus Mechanism for Blockchain Networks
Comments: 16 pages, 13 figures, 2 tables, this article was submitted to a JCR journal
Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET); Logic in Computer Science (cs.LO)
[571]  arXiv:2404.13895 (replaced) [pdf, other]
Title: Optimal Design for Human Feedback
Subjects: Machine Learning (cs.LG)
[572]  arXiv:2404.16698 (replaced) [pdf, other]
Title: Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents
Comments: Revised version
Subjects: Computation and Language (cs.CL)
[573]  arXiv:2404.16734 (replaced) [pdf, ps, other]
Title: Uniform Substitution for Differential Refinement Logic
Comments: IJCAR 2024
Subjects: Logic in Computer Science (cs.LO)
[574]  arXiv:2404.18084 (replaced) [pdf, other]
Title: Age-minimal Multicast by Graph Attention Reinforcement Learning
Subjects: Networking and Internet Architecture (cs.NI)
[575]  arXiv:2404.18239 (replaced) [pdf, other]
Title: SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[576]  arXiv:2404.18426 (replaced) [pdf, other]
Title: Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[577]  arXiv:2404.18886 (replaced) [pdf, other]
Title: A Survey on Diffusion Models for Time Series and Spatio-Temporal Data
Comments: Ongoing work; 27 pages, 8 figures, 2 tables; Github Repo: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[578]  arXiv:2404.19392 (replaced) [pdf, other]
Title: Convergence analysis of the transformed gradient projection algorithms on compact matrix manifolds
Comments: 45 pages, 5 figures, 4 tables
Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
[579]  arXiv:2405.00515 (replaced) [pdf, other]
Title: GAD-Generative Learning for HD Map-Free Autonomous Driving
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[580]  arXiv:2405.00846 (replaced) [pdf, other]
Title: Gameplay Filters: Safe Robot Walking through Adversarial Imagination
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[581]  arXiv:2405.01834 (replaced) [pdf, other]
Title: 3-center and 4-center 2-particle Gaussian AO integrals on modern accelerated processors
Subjects: Computational Physics (physics.comp-ph); Materials Science (cond-mat.mtrl-sci); Distributed, Parallel, and Cluster Computing (cs.DC); Chemical Physics (physics.chem-ph)
[582]  arXiv:2405.02795 (replaced) [pdf, other]
Title: Graph as Point Set
Comments: accepted in ICML 2024
Subjects: Machine Learning (cs.LG)
[583]  arXiv:2405.02965 (replaced) [pdf, other]
Title: Robust Collaborative Perception without External Localization and Clock Devices
Comments: 6pages, accepted to ICRA 2024
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
[584]  arXiv:2405.05529 (replaced) [pdf, other]
Title: Tomur: Traffic-Aware Performance Prediction of On-NIC Network Functions with Multi-Resource Contention
Comments: Correct the typos in evaluation and appendix
Subjects: Networking and Internet Architecture (cs.NI)
[585]  arXiv:2405.07801 (replaced) [pdf, other]
Title: Deep Learning-Based Object Pose Estimation: A Comprehensive Survey
Comments: 27 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[586]  arXiv:2405.07960 (replaced) [pdf, other]
Title: AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
[587]  arXiv:2405.08295 (replaced) [pdf, other]
Title: SpeechVerse: A Large-scale Generalizable Audio Language Model
Comments: Single Column, 13 page
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[588]  arXiv:2405.09691 (replaced) [pdf, ps, other]
Title: Modeling User Preferences via Brain-Computer Interfacing
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[589]  arXiv:2405.10780 (replaced) [pdf, ps, other]
Title: Intelligent and Miniaturized Neural Interfaces: An Emerging Era in Neurotechnology
Journal-ref: 2024 IEEE Custom Integrated Circuits Conference (CICC), Denver, CO, USA, 2024, pp. 1-7
Subjects: Signal Processing (eess.SP); Hardware Architecture (cs.AR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
[590]  arXiv:2405.11129 (replaced) [pdf, other]
Title: MotionGS : Compact Gaussian Splatting SLAM by Motion Filter
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[591]  arXiv:2405.11190 (replaced) [pdf, other]
Title: ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[592]  arXiv:2405.11386 (replaced) [pdf, other]
Title: Liver Fat Quantification Network with Body Shape
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[593]  arXiv:2405.11656 (replaced) [pdf, other]
Title: URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images
Comments: Accepted at RSS2024
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[594]  arXiv:2405.12424 (replaced) [pdf, other]
Title: Rethinking Robustness Assessment: Adversarial Attacks on Learning-based Quadrupedal Locomotion Controllers
Comments: RSS 2024
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[595]  arXiv:2405.13350 (replaced) [pdf, other]
Title: Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages
Comments: LXAI Workshop at the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[596]  arXiv:2405.13401 (replaced) [pdf, ps, other]
Title: TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
Comments: 19 pages, 14 figures, 4 tables
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
[597]  arXiv:2405.13540 (replaced) [pdf, other]
Title: Directly Denoising Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[598]  arXiv:2405.13923 (replaced) [pdf, other]
Title: Why Not Transform Chat Large Language Models to Non-English?
Subjects: Computation and Language (cs.CL)
[599]  arXiv:2405.14200 (replaced) [pdf, other]
Title: Awesome Multi-modal Object Tracking
Comments: A continuously updated project to track the latest progress in multi-modal object tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[600]  arXiv:2405.14578 (replaced) [pdf, other]
Title: Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Subjects: Machine Learning (cs.LG)
[601]  arXiv:2405.14622 (replaced) [pdf, other]
Title: Calibrated Self-Rewarding Vision Language Models
Comments: fix some typos and add acknowledgement section in V3
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[602]  arXiv:2405.15032 (replaced) [pdf, other]
[603]  arXiv:2405.15154 (replaced) [pdf, other]
Title: Online Prompt Pricing based on Combinatorial Multi-Armed Bandit and Hierarchical Stackelberg Game
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[604]  arXiv:2405.15187 (replaced) [pdf, other]
Title: Chance-Constrained Economic Dispatch with Flexible Loads and RES
Subjects: Systems and Control (eess.SY)
[605]  arXiv:2405.15259 (replaced) [pdf, other]
Title: Robust Economic Dispatch with Flexible Demand and Adjustable Uncertainty Set
Subjects: Systems and Control (eess.SY)
[606]  arXiv:2405.15465 (replaced) [pdf, other]
Title: Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[607]  arXiv:2405.15682 (replaced) [pdf, other]
Title: The Road Less Scheduled
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
[608]  arXiv:2405.15793 (replaced) [pdf, other]
Title: SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
Comments: Code, data, and demo available at this https URL
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[609]  arXiv:2405.15923 (replaced) [pdf, ps, other]
Title: Spiketrum: An FPGA-based Implementation of a Neuromorphic Cochlea
Comments: To be published at "IEEE Transactions on Circuits and Systems"
Subjects: Signal Processing (eess.SP); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[610]  arXiv:2405.15927 (replaced) [pdf, ps, other]
Title: Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"
Comments: To be published at "IEEE/ACM Transactions on Audio, Speech, and Language Processing"
Subjects: Signal Processing (eess.SP); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
[611]  arXiv:2405.16021 (replaced) [pdf, other]
Title: VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration
Authors: Michael Ahn (1), Montserrat Gonzalez Arenas (1), Matthew Bennice (2), Noah Brown (5), Christine Chan (1), Byron David (1), Anthony Francis (4), Gavin Gonzalez (6), Rainer Hessmer (2), Tomas Jackson (6), Nikhil J Joshi (1), Daniel Lam (2), Tsang-Wei Edward Lee (1), Alex Luong (6), Sharath Maddineni (1), Harsh Patel (2), Jodilyn Peralta (6), Jornell Quiambao (5), Diego Reyes (5), Rosario M Jauregui Ruano (6), Dorsa Sadigh (1), Pannag Sanketi (1), Leila Takayama (3), Pavel Vodenski (2), Fei Xia (1) ((1) Google DeepMind, (2) Everyday Robots, (3) Hoku Labs, (4) Logical Robotics, (5) FS Studio, (6) Relentless Adrenalin)
Comments: 9 pages, 4 figures
Subjects: Robotics (cs.RO)
[612]  arXiv:2405.16056 (replaced) [pdf, other]
Title: FedSheafHN: Personalized Federated Learning on Graph-structured Data
Comments: This paper was submitted to ICML 2024 in Feb 2024. You can find a record here:this https URL
Subjects: Machine Learning (cs.LG)
[613]  arXiv:2405.16069 (replaced) [pdf, other]
Title: IncomeSCM: From tabular data set to time-series simulator and causal estimation benchmark
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
[614]  arXiv:2405.16649 (replaced) [pdf, other]
Title: Deep Koopman Learning using the Noisy Data
Subjects: Systems and Control (eess.SY)
[615]  arXiv:2405.17697 (replaced) [pdf, other]
Title: P4: Towards private, personalized, and Peer-to-Peer learning
Subjects: Machine Learning (cs.LG)
[616]  arXiv:2405.18418 (replaced) [pdf, other]
Title: Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Comments: Code and videos at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[617]  arXiv:2405.18558 (replaced) [pdf, other]
Title: "Golden Ratio Yoshimura" for Meta-Stable and Massively Reconfigurable Deployment
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[618]  arXiv:2405.18641 (replaced) [pdf, other]
Title: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning
Subjects: Machine Learning (cs.LG)
[619]  arXiv:2405.18657 (replaced) [pdf, other]
Title: The Efficacy of the Connect America Fund in Addressing US Internet Access Inequities
Subjects: Networking and Internet Architecture (cs.NI)
[620]  arXiv:2405.18669 (replaced) [pdf, other]
Title: Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Comments: Under review at NeurIPS
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[621]  arXiv:2405.18839 (replaced) [pdf, other]
Title: MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[622]  arXiv:2405.18870 (replaced) [pdf, other]
Title: LLMs achieve adult human performance on higher-order theory of mind tasks
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[623]  arXiv:2405.18933 (replaced) [pdf, other]
Title: LSPI: Heterogeneous Graph Neural Network Classification Aggregation Algorithm Based on Size Neighbor Path Identification
Subjects: Machine Learning (cs.LG)
[624]  arXiv:2405.19059 (replaced) [pdf, other]
Title: Robust Entropy Search for Safe Efficient Bayesian Optimization
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[625]  arXiv:2405.19092 (replaced) [pdf, other]
Title: Benchmarking and Improving Detail Image Caption
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[626]  arXiv:2405.19325 (replaced) [pdf, other]
Title: Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Subjects: Computation and Language (cs.CL)
[627]  arXiv:2405.19358 (replaced) [pdf, other]
Title: Robustifying Safety-Aligned Large Language Models through Clean Data Curation
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[628]  arXiv:2405.19383 (replaced) [pdf, other]
Title: Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental Evaluation
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
[629]  arXiv:2405.19542 (replaced) [pdf, other]
Title: Anatomical Region Recognition and Real-time Bone Tracking Methods by Dynamically Decoding A-Mode Ultrasound Signals
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Robotics (cs.RO)
[630]  arXiv:2405.19620 (replaced) [pdf, other]
Title: SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[631]  arXiv:2405.19670 (replaced) [pdf, other]
Title: One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models
Comments: working in progress, repo: this https URL
Subjects: Computation and Language (cs.CL)
[632]  arXiv:2405.19687 (replaced) [pdf, other]
Title: Autonomous Driving with Spiking Neural Networks
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
[633]  arXiv:2405.19732 (replaced) [pdf, other]
Title: Two Optimizers Are Better Than One: LLM Catalyst for Enhancing Gradient-Based Optimization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[634]  arXiv:2405.19751 (replaced) [pdf, other]
Title: HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[635]  arXiv:2405.19787 (replaced) [pdf, other]
Title: From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
[636]  arXiv:2405.19831 (replaced) [pdf, other]
Title: Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text
Comments: 10 pages, 2 figures, 2 tables. Accepted to ARES 2024 (IWAPS)
Subjects: Computation and Language (cs.CL)
[637]  arXiv:2405.19917 (replaced) [pdf, other]
Title: Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[638]  arXiv:2405.19961 (replaced) [pdf, other]
Title: Collective Variable Free Transition Path Sampling with Generative Flow Network
Comments: 9 pages, 5 figures, 2 tables
Subjects: Machine Learning (cs.LG)
[639]  arXiv:2405.19967 (replaced) [pdf, other]
Title: Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[640]  arXiv:2405.19996 (replaced) [pdf, other]
Title: DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[641]  arXiv:2405.20042 (replaced) [pdf, other]
Title: CycleFormer : TSP Solver Based on Language Modeling
Subjects: Machine Learning (cs.LG)
[642]  arXiv:2405.20052 (replaced) [pdf, other]
Title: Hardware-Efficient EMG Decoding for Next-Generation Hand Prostheses
Comments: \{copyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[643]  arXiv:2405.20067 (replaced) [pdf, other]
Title: N-Dimensional Gaussians for Fitting of High Dimensional Functions
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[644]  arXiv:2405.20078 (replaced) [pdf, ps, other]
Title: NeRF View Synthesis: Subjective Quality Assessment and Objective Metrics Evaluation
Subjects: Multimedia (cs.MM)
[645]  arXiv:2405.20081 (replaced) [pdf, other]
Title: NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models
Comments: 14 pages, 5 figures with supplementary material
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[646]  arXiv:2405.20091 (replaced) [pdf, other]
Title: Visual Attention Analysis in Online Learning
Comments: Accepted in CEDI 2024 (VII Congreso Espa\~nol de Inform\'atica), A Coru\~na, Spain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[647]  arXiv:2405.20142 (replaced) [pdf, other]
Title: MSSC-BiMamba: Multimodal Sleep Stage Classification and Early Diagnosis of Sleep Disorders with Bidirectional Mamba
Comments: 10 pages
Subjects: Artificial Intelligence (cs.AI)
[648]  arXiv:2405.20172 (replaced) [pdf, other]
Title: Iterative Feature Boosting for Explainable Speech Emotion Recognition
Comments: Published in: 2023 International Conference on Machine Learning and Applications (ICMLA)
Journal-ref: 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 2023, pp. 543-549
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[649]  arXiv:2405.20195 (replaced) [pdf, other]
Title: Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations
Subjects: Human-Computer Interaction (cs.HC)
[650]  arXiv:2405.20247 (replaced) [pdf, other]
Title: KerasCV and KerasNLP: Vision and Language Power-Ups
Comments: Submitted to Journal of Machine Learning Open Source Software
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Software Engineering (cs.SE)
[651]  arXiv:2405.20299 (replaced) [pdf, other]
Title: Scaling White-Box Transformers for Vision
Comments: project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[652]  arXiv:2405.20310 (replaced) [pdf, other]
Title: A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction
Comments: preprint, under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[653]  arXiv:2405.20319 (replaced) [pdf, other]
Title: ParSEL: Parameterized Shape Editing with Language
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Symbolic Computation (cs.SC)
[654]  arXiv:2405.20330 (replaced) [pdf, other]
Title: 4DHands: Reconstructing Interactive Hands in 4D with Transformers
Comments: More demo videos can be seen at our project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[ total of 654 entries: 1-654 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2406, contact, help  (Access key information)