title
stringlengths
18
186
year
int32
2.02k
2.03k
authors
stringlengths
14
2.61k
venue
stringlengths
0
142
abstract
stringlengths
22
5.94k
is_seed
bool
2 classes
score
int32
1
5
analysis
stringlengths
249
693
aspects
stringlengths
132
471
found_from
stringclasses
78 values
embedding
stringlengths
17.1k
17.2k
umap_coords
stringlengths
56
64
normalized_title
stringlengths
18
183
pdf_url
stringlengths
29
283
pdf_disclaimer
stringclasses
1 value
Achieving Peak Performance for Large Language Models: A Systematic Review
2,024
[["Zhyar Rzgar K Rostam", "Buda Health Center"], ["S\u00e1ndor Sz\u00e9n\u00e1si", "Buda Health Center"], ["G\u00e1bor Kert\u00e9sz", "Institute for Computer Science and Control"]]
IEEE Access
In recent years, large language models (LLMs) have achieved remarkable\nsuccess in natural language processing (NLP). LLMs require an extreme amount of\nparameters to attain high performance. As models grow into the\ntrillion-parameter range, computational and memory costs increase\nsignificantly. This makes it difficu...
true
4
The abstract addresses algorithmic efficiency through training optimization and pruning, architectural design via LLM architecture trends, and hardware optimization with accelerator-aware methods. While it lacks focus on data processing selection, it provides a comprehensive review of methods relevant to making AI more...
{"algorithmic_efficiency": "Discusses training optimization and pruning strategies", "architectural_design": "Reviews architectural approaches in LLMs for efficiency", "data_processing_selection": "Does not address data selection or preprocessing", "hardware_optimization": "Explicitly covers hardware optimization strat...
{"references": [], "citations": ["achieving_peak_performance_for_large_language_models_a_systematic_review"]}
{"embedding": [0.018744714558124542, -0.020137012004852295, -0.0032274711411446333, 0.0011716699227690697, 0.023248694837093353, 0.008666163310408592, -0.01151387020945549, -0.02878744900226593, 0.02361520566046238, -0.010858137160539627, -0.04070904105901718, -0.002274261089041829, -0.007021588739007711, -0.0526145286...
{"umap_x": 2.6593964099884033, "umap_y": 4.380886077880859}
achieving_peak_performance_for_large_language_models_a_systematic_review
null
MizAR 60 for Mizar 50
2,023
[["Ashish Vaswani", ""], ["Noam Shazeer", ""], ["Niki Parmar", ""], ["Jakob Uszkoreit", ""], ["Llion Jones", ""], ["Aidan N. Gomez", ""], ["\u0141ukasz Kaiser", ""], ["Illia Polosukhin", ""]]
Leibniz-Zentrum für Informatik (Schloss Dagstuhl)
As a present to Mizar on its 50th anniversary, we develop an AI/TP system that automatically proves about 60% of the Mizar theorems in the hammer setting. We also automatically prove 75% of the Mizar theorems when the automated provers are helped by using only the premises used in the human-written Mizar proofs. We des...
false
3
The paper addresses algorithmic efficiency through learning-based premise selection and data processing via active learning and corpus growth. However, it does not discuss architectural design, hardware co-design, or explicit optimization for deployment or training efficiency. While it touches on efficiency in automate...
{"algorithmic_efficiency": "Learning-based premise selection improves proof efficiency", "architectural_design": "No novel architecture proposed; relies on existing provers", "data_processing_selection": "Active learning and premise selection for proof generation", "hardware_optimization": "Not addressed; focuses on so...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "dfx_a_lowlatency_multifpga_appliance_for_accelerating_transformerbased_text_generation", "lightseq2_accelerated_training_for_transformerbased_models_on_gpus", "easy_and_efficient_transformer_scalable_inference_solution_for_larg...
{"embedding": [0.007636173628270626, 0.029408464208245277, -0.016905926167964935, 0.02631826512515545, 0.013454781845211983, 0.01152308750897646, -0.022448567673563957, -0.018709110096096992, -0.03171120956540108, 0.023483285680413246, -0.009818196296691895, -0.022903451696038246, 0.001448972150683403, -0.0005680182948...
{"umap_x": 0.9614489078521729, "umap_y": 1.2693313360214233}
mizar_60_for_mizar_50
null
Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey
2,023
[["Bonan Min", "Amazon (United States)"], ["Hayley Ross", "Harvard University Press"], ["Elior Sulem", "California University of Pennsylvania"], ["Amir Pouran Ben Veyseh", "University of Oregon"], ["Thien Huu Nguyen", "University of Oregon"], ["Oscar Sainz", "University of the Basque Country"], ["Eneko Agirre", "Univer...
ACM Computing Surveys
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically changed the Natural Language Processing (NLP) field. For numerous NLP tasks, approaches leveraging PLMs have achieved state-of-the-art performance. The key idea is to learn a generic, latent representation of language from a generic task on...
false
2
The abstract focuses on PLM architectures and training paradigms but does not discuss algorithmic efficiency, architectural optimizations for speed/memory, data reduction methods, or hardware co-design. It provides a general survey of NLP advancements rather than addressing efficiency or accessibility for trainers or d...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "a_survey_of_text_classification_with_transformers_how_wide_how_large_how_long_how_accurate_how_expensive_how_safe"], "citations": ["easy_and_efficient_transformer_scalable_inference_solution_for_large_nlp_model"]}
{"embedding": [0.013605968095362186, 0.0028972828295081854, -0.007290709298104048, 0.027001887559890747, 0.000169931270647794, 0.008308188989758492, -0.00027251054416410625, -0.05947703495621681, -0.009720870293676853, 0.002297352533787489, -0.006747138220816851, -0.000958088377956301, 0.0088969049975276, -0.0156478676...
{"umap_x": 0.470488041639328, "umap_y": 2.85550856590271}
recent_advances_in_natural_language_processing_via_large_pretrained_language_models_a_survey
null
Talking about Large Language Models
2,024
[["Murray Shanahan", "Imperial College London"]]
Communications of the ACM
Interacting with a contemporary LLM-based conversational agent can create an illusion of being in the presence of a thinking creature. Yet, in their very nature, such systems are fundamentally not like us.
false
1
The abstract discusses the phenomenological aspect of LLM interactions but does not touch on algorithmic efficiency, architectural design, data processing, or hardware optimization. It lacks technical details relevant to making AI more efficient or accessible for trainers and deployers.
{"algorithmic_efficiency": "not addressed", "architectural_design": "not addressed", "data_processing_selection": "not addressed", "hardware_optimization": "not addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.018812954425811768, 0.025025248527526855, -0.017862729728221893, 0.009612304158508778, 0.02036122977733612, 0.024820098653435707, -0.0011499490356072783, -0.05679279565811157, -0.010810595005750656, 0.0258053969591856, -0.0251341313123703, -0.009443783201277256, -0.007301933132112026, -0.00335749844089...
{"umap_x": -0.6382922530174255, "umap_y": 2.345301389694214}
talking_about_large_language_models
https://dl.acm.org/doi/pdf/10.1145/3624724
DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
2,022
[["Reza Yazdani Aminabadi", "Microsoft (United States)"], ["Samyam Rajbhandari", "Microsoft (United States)"], ["Ammar Ahmad Awan", "Microsoft (United States)"], ["Cheng Li", "Microsoft (United States)"], ["Canbing Li", "Microsoft (United States)"], ["Elton Zheng", "Microsoft (United States)"], ["Olatunji Ruwase", "Mic...
The landscape of transformer model inference is increasingly diverse in model size, model characteristics, latency and throughput requirements, hardware requirements, etc. With such diversity, designing a versatile inference system is challenging. DeepSpeed-Inference addresses these challenges by (1) a multi-GPU infere...
true
4
The abstract addresses algorithmic efficiency through sparse transformers and memory optimization, architectural design via multi-GPU and heterogeneous systems, and hardware optimization through CPU/NVMe/GPU co-design. It directly supports the research question by advancing efficient, scalable inference for large model...
{"algorithmic_efficiency": "Focuses on sparse transformers and memory-efficient inference", "architectural_design": "Provides multi-GPU and heterogeneous architecture for scalable inference", "data_processing_selection": "NONE", "hardware_optimization": "Leverages CPU/NVMe/GPU co-design for hardware-aware inference"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "efficient_llms_training_and_inference_an_introduction", "norm_tweaking_highperformance_lowbit_quantization_of_large_language_models"], "citations": ["deepspeed_inference_enabling_efficient_inference_of_transformer_models_at_unp...
{"embedding": [0.005358639173209667, 0.012594535015523434, -0.014781453646719456, 0.0105776097625494, 0.01512176264077425, -0.00659312354400754, 0.014560402370989323, -0.020127423107624054, -0.005209506954997778, 0.0002483512507751584, 0.007294585928320885, -0.01147187314927578, -0.027572590857744217, 0.000366264575859...
{"umap_x": 3.242175579071045, "umap_y": 5.023370742797852}
deepspeed_inference_enabling_efficient_inference_of_transformer_models_at_unprecedented_scale
null
Language Model Behavior: A Comprehensive Survey
2,023
[["Tyler A. Chang", "University of California, San Diego"], ["Benjamin Bergen", "University of California, San Diego"]]
Computational Linguistics
Abstract Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in synt...
false
1
The abstract does not discuss algorithmic efficiency, architectural design, data processing/selection, or hardware optimization. It focuses broadly on language model behavior, capabilities, and limitations rather than efficiency or accessibility improvements for trainers and deployers.
{"algorithmic_efficiency": "not addressed", "architectural_design": "not addressed", "data_processing_selection": "not addressed", "hardware_optimization": "not addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["easy_and_efficient_transformer_scalable_inference_solution_for_large_nlp_model"]}
{"embedding": [0.013850832358002663, 0.009598132222890854, -0.022173628211021423, 0.026053739711642265, 0.018428746610879898, 0.02667960338294506, -0.005014568567276001, -0.025276267901062965, -0.019177529960870743, -0.0012178962351754308, -0.021060362458229065, 0.003723887959495187, 0.016096685081720352, -0.0233343504...
{"umap_x": 0.15042969584465027, "umap_y": 2.475184202194214}
language_model_behavior_a_comprehensive_survey
https://direct.mit.edu/coli/article-pdf/doi/10.1162/coli_a_00492/2177312/coli_a_00492.pdf
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
2,023
[["Shenggui Li", "Singapore Institute of Technology"], ["Hongxin Liu", ""], ["Zhengda Bian", ""], ["Jiarui Fang", ""], ["Haichen Huang", ""], ["Yuliang Liu", ""], ["Boxiang Wang", "Singapore Institute of Technology"], ["Yang You", "National University of Singapore"]]
The success of Transformer models has pushed the deep learning model scale to billions of parameters, but the memory limitation of a single GPU has led to an urgent need for training on multi-GPU clusters. However, the best practice for choosing the optimal parallel strategy is still lacking, as it requires domain expe...
false
3
The abstract does not focus on algorithmic efficiency, architectural design, or data processing/selective strategies. It addresses hardware scalability through parallelism in distributed training, indirectly relating to hardware optimization via multi-GPU and heterogeneous system support. While relevant to large-scale ...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not directly addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Indirectly considered via hardware-aware parallelism"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "efficient_llms_training_and_inference_an_introduction"], "citations": []}
{"embedding": [-0.010109851136803627, 0.03784191980957985, -0.0004919135826639831, 0.0004271979269105941, -0.0022992815356701612, 0.01298289094120264, -0.004713696893304586, -0.03154461830854416, 0.007398037705570459, 0.021724022924900055, -0.022841591387987137, 0.005944942589849234, 0.0027237236499786377, -0.011469012...
{"umap_x": 3.6683053970336914, "umap_y": 4.17811918258667}
colossalai_a_unified_deep_learning_system_for_largescale_parallel_training
null
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
2,023
[["Yujia Zhai", "University of California, Riverside"], ["Chengquan Jiang", ""], ["Leyuan Wang", ""], ["Xiaoying Jia", ""], ["Shang Zhang", "Nvidia (United States)"], ["Zizhong Chen", "University of California, Riverside"], ["Xin Liu", ""], ["Yibo Zhu", ""]]
Transformers have become keystone models in natural language processing over the past decade. They have achieved great popularity in deep learning applications, but the increasing sizes of the parameter spaces required by transformer models generate a commensurate need to accelerate performance. Natural language proces...
false
4
The paper addresses algorithmic efficiency through a padding-free approach that eliminates redundant computations on padded tokens, and enhances architectural design via optimized Multi-Head Attention modules. These improvements directly reduce computational and memory overhead in variable-length sequence processing, a...
{"algorithmic_efficiency": "Padding-free computation reduces redundant operations", "architectural_design": "Architecture-aware MHA optimizations improve performance", "data_processing_selection": "NONE", "hardware_optimization": "NONE"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.00898724514991045, 0.011069969274103642, -0.026894288137555122, 0.02090602181851864, 0.023065006360411644, -0.009173288941383362, 0.02364027500152588, -0.021254217252135277, -0.002448089886456728, -0.01972067542374134, -0.013147646561264992, -0.01075529120862484, 0.0014350530691444874, -0.0182675495743...
{"umap_x": 1.7971622943878174, "umap_y": 4.869078636169434}
bytetransformer_a_highperformance_transformer_boosted_for_variablelength_inputs
null
Sentiment Analysis with Neural Models for Hungarian
2,023
[["L\u00e1szl\u00f3 J\u00e1nos Laki", "Hungarian Research Centre for Linguistics"], ["Zijian Gy\u0151z\u0151 Yang", ""]]
Acta Polytechnica Hungarica
Sentiment analysis is a powerful tool to gain insight into the emotional polarity of opinionated texts.Computerized applications can contribute to the establishment of nextgeneration models that can provide us with data of unprecedented quantity and quality.However, these models often require substantial amount of reso...
false
3
The paper addresses algorithmic efficiency through data augmentation to reduce training resource needs and improves model performance. While it uses advanced architectural designs (transformers) and data processing (cross-lingual transfer), it does not discuss hardware co-design or optimization. It is relevant to AI ef...
{"algorithmic_efficiency": "Data augmentation improves model efficiency", "architectural_design": "Uses neural transformer architecture for Hungarian sentiment analysis", "data_processing_selection": "Applies machine translation and cross-lingual transfer to expand training data", "hardware_optimization": "NONE"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [-0.016728023067116737, 0.0011090640909969807, -0.005491161718964577, -0.023904835805296898, 0.03722536936402321, 0.00013329229841474444, 0.006434789393097162, -0.05593295022845268, 0.02333408035337925, 0.0019759542774409056, -0.02092576026916504, 0.0033093139063566923, -0.022312456741929054, -0.034247461...
{"umap_x": -0.4204283058643341, "umap_y": 4.553020000457764}
sentiment_analysis_with_neural_models_for_hungarian
https://doi.org/10.12700/aph.20.5.2023.5.8
ResearchRabbit (product review)
2,023
[["Victoria Cole", "University of Ottawa"], ["Mish Boutet", "University of Ottawa"]]
Journal of the Canadian Health Libraries Association / Journal de l Association de bilbiothèques de la santé du Canada
ResearchRabbit is a scholarly publication discovery tool supported by artificial intelligence (AI).It was developed in 2021 by a team of three in Seattle [1].This tool lets users discover publications related to one or more seed publications with the help of visualization maps and lists of earlier, later, and similar p...
false
1
The abstract describes a scholarly discovery tool that uses AI for recommendation generation based on publication metadata and author networks, but does not address algorithmic efficiency, architectural design, data processing, or hardware optimization. It is unrelated to improving AI efficiency or accessibility in tra...
{"algorithmic_efficiency": "Not applicable", "architectural_design": "Not applicable", "data_processing_selection": "Not applicable", "hardware_optimization": "Not applicable"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [-0.024287216365337372, 0.010205757804214954, -0.012635764665901661, -0.02005367912352085, 0.007600550074130297, -0.005051074083894491, 0.003222872968763113, -0.017768915742635727, 0.0009575911099091172, 0.006911933422088623, -0.027985520660877228, -8.920094114728272e-05, 0.023579664528369904, -0.00313804...
{"umap_x": 0.004384573083370924, "umap_y": 3.5795931816101074}
researchrabbit_product_review
https://journals.library.ualberta.ca/jchla/index.php/jchla/article/download/29699/21872
Helping Cancer Patients to Choose the Best Treatment: Towards Automated Data-Driven and Personalized Information Presentation of Cancer Treatment Options
2,024
[["Teven Le Scao", ""], ["Angela Fan", ""], ["Christopher Akiki", ""], ["Ellie Pavlick", ""], ["Suzana Ili\u0107", ""], ["Daniel Hesslow", ""], ["Roman Castagn\u00e9", ""], ["Alexandra Sasha Luccioni", ""], ["Fran\u00e7ois Yvon", "Laboratoire Interdisciplinaire des Sciences du Num\u00e9rique"], ["Matthias Gall\u00e9", ...
Leibniz-Zentrum für Informatik (Schloss Dagstuhl)
When a person is diagnosed with cancer, difficult decisions about treatments need to be made. In this chapter, we describe an interdisciplinary research project which aims to automatically generate personalized descriptions of treatment options for patients. We relied on two large databases provided by the Netherlands ...
false
1
The paper addresses data processing via integration of cancer registries and patient data for personalization, but does not discuss algorithmic efficiency, architectural design, or hardware optimization. It is not relevant to AI efficiency or accessibility for trainers/deployers in technical AI systems.
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Data integration from registries used", "hardware_optimization": "Not addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "a_survey_on_model_compression_for_large_language_models"], "citations": []}
{"embedding": [-0.00651725335046649, 0.005067155696451664, -0.012801232747733593, 0.017768610268831253, 0.014745012857019901, -0.012779515236616135, -0.026963194832205772, -0.024592649191617966, -0.015463188290596008, -0.02961633913218975, -0.01939689740538597, 0.028315652161836624, 0.020130524411797523, -0.02873341925...
{"umap_x": -0.2869824767112732, "umap_y": 3.6795871257781982}
helping_cancer_patients_to_choose_the_best_treatment_towards_automated_datadriven_and_personalized_information_presentation_of_cancer_treatment_options
null
Efficient Memory Management for Large Language Model Serving with PagedAttention
2,023
[["Woosuk Kwon", "Berkeley College"], ["Z. Li", "Berkeley College"], ["Siyuan Zhuang", "Berkeley College"], ["Ying Sheng", "Stanford University"], ["L Zheng", "Berkeley College"], ["Cody Hao Yu", ""], ["Joseph E. Gonzalez", "Berkeley College"], ["Hao Zhang", "University of California, San Diego"], ["Ion Stoica", "Berke...
High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted...
false
4
The paper addresses algorithmic efficiency through memory reduction via PagedAttention, improving cache utilization in large language model serving. While focused on inference efficiency and memory management, it does not directly cover data processing, hardware co-design, or model architecture optimization for trainin...
{"algorithmic_efficiency": "PagedAttention reduces memory overhead via dynamic cache management", "architectural_design": "PagedAttention enables efficient memory sharing in LLM serving architecture", "data_processing_selection": "None", "hardware_optimization": "None"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "efficient_llms_training_and_inference_an_introduction"], "citations": []}
{"embedding": [0.010908889584243298, 0.00010443985229358077, -0.002104830928146839, 0.004587905015796423, 0.05478828400373459, 0.02455724962055683, 0.0022569408174604177, -0.06737828254699707, 0.011046898551285267, -0.015866421163082123, -0.03288358449935913, -0.017651589587330818, 0.004536256194114685, 0.0037816383410...
{"umap_x": 2.797762870788574, "umap_y": 4.450588226318359}
efficient_memory_management_for_large_language_model_serving_with_pagedattention
https://dl.acm.org/doi/pdf/10.1145/3600006.3613165
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
2,022
[["Seongmin Hong", "Korea Advanced Institute of Science and Technology"], ["Seungjae Moon", "Korea Advanced Institute of Science and Technology"], ["Junsoo Kim", "Korea Advanced Institute of Science and Technology"], ["Sungjae Lee", "Naver (South Korea)"], ["Minsub Kim", "Naver (South Korea)"], ["Dongsoo Lee", "Naver (...
Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pretrained Transformer (GPT) has achieved remarkable performance in text generation, or natural language generation (NLG), which needs the processing of a large i...
true
4
The paper focuses on hardware optimization through a dedicated multi-FPGA appliance designed to efficiently handle both summarization and generation stages of GPT-2, leveraging model parallelism and custom instructions. It directly addresses hardware optimization and architectural design for transformer inference, with...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "FPGA-based hardware architecture optimized for low latency and throughput", "data_processing_selection": "Not addressed", "hardware_optimization": "Co-designed model-and-hardware-aware FPGA architecture with optimized dataflow and memory utilization"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["dfx_a_lowlatency_multifpga_appliance_for_accelerating_transformerbased_text_generation", "multiobjective_hardwaremapping_cooptimisation_for_multidnn_workloads_on_chipletbased_accelerators"]}
{"embedding": [0.007709955796599388, 0.012385492213070393, 0.0005226159119047225, 0.021266454830765724, 0.013571176677942276, 0.026654882356524467, 0.0016521007055416703, -0.005224340129643679, 0.02102954499423504, 0.003909608349204063, -0.006373608019202948, -0.013199692592024803, -0.019717304036021233, -0.01027669571...
{"umap_x": 3.1482186317443848, "umap_y": 5.06509256362915}
dfx_a_lowlatency_multifpga_appliance_for_accelerating_transformerbased_text_generation
null
Petals: Collaborative Inference and Fine-tuning of Large Models
2,023
[["Alexander Borzunov", "National Research University Higher School of Economics"], ["Dmitry Baranchuk", "National Research University Higher School of Economics"], ["Tim Dettmers", "National Research University Higher School of Economics"], ["Maksim Riabinin", ""], ["Younes Belkada", "National Research University High...
Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Maksim Riabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, Colin Raffel. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 2023.
false
3
The paper addresses algorithmic efficiency through collaborative fine-tuning to reduce training costs and improves architectural design via shared inference frameworks. While it touches on data selection through collaborative filtering, it does not explicitly address data preprocessing or synthesis. Hardware optimizati...
{"algorithmic_efficiency": "Focuses on collaborative fine-tuning for reduced training cost", "architectural_design": "Implements shared inference architecture for efficient model scaling", "data_processing_selection": "Uses collaborative filtering to optimize data utilization", "hardware_optimization": " NONE"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["easy_and_efficient_transformer_scalable_inference_solution_for_large_nlp_model"]}
{"embedding": [0.005658071022480726, -0.0006208659033291042, -0.03673231229186058, -0.007794434670358896, 0.03017568215727806, -0.01610506884753704, 0.008897530846297741, -0.026460397988557816, -0.0005138767301104963, 0.026755910366773605, 0.008478750474750996, -0.028113463893532753, 0.02311752177774906, -0.01564012281...
{"umap_x": 1.09892737865448, "umap_y": 1.52096426486969}
petals_collaborative_inference_and_finetuning_of_large_models
https://aclanthology.org/2023.acl-demo.54.pdf
LightSeq2: Accelerated Training for Transformer-Based Models on GPUs
2,022
[["Xiaohui Wang", ""], ["Wei Yang", ""], ["Ying Xiong", ""], ["Guyue Huang", "University of California, Santa Barbara"], ["Xian Qian", ""], ["Yufei Ding", "University of California, Santa Barbara"], ["Mingxuan Wang", ""], ["Lei Li", "University of California, Santa Barbara"]]
Transformer-based neural models are used in many AI applications. Training these models is expensive, as it takes huge GPU resources and long duration. It is challenging because typical data like sentences have variable lengths, and Transformer's computation patterns are more complex than convolutional neural networks....
true
4
The paper addresses algorithmic efficiency and hardware optimization through GPU-specific tuning of Transformer training. It improves training speed across diverse architectures, directly contributing to AI efficiency and accessibility for trainers. This aligns strongly with the research question on efficient and acces...
{"algorithmic_efficiency": "Focus on training speed via GPU-specific optimizations", "architectural_design": "Supports multiple Transformer architectures including encoder-decoder", "data_processing_selection": "None mentioned", "hardware_optimization": "Proposes GPU-specific optimizations for Transformer computation f...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["lightseq2_accelerated_training_for_transformerbased_models_on_gpus"]}
{"embedding": [0.017972199246287346, 0.02991500124335289, 0.0017670871457085013, 0.015208289958536625, 0.013580096885561943, 0.003627980360761285, -0.016163107007741928, -0.004026077222079039, -0.0037318114191293716, -0.004397215321660042, -0.021021289750933647, 0.00771080469712615, -0.010456560179591179, -0.0291737169...
{"umap_x": 2.5692994594573975, "umap_y": 4.901721954345703}
lightseq2_accelerated_training_for_transformerbased_models_on_gpus
null
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
2,023
[["Kai Lv", "Fudan University"], ["Shuo Zhang", "Fudan University"], ["Tianle Gu", "Shanghai Artificial Intelligence Laboratory"], ["Shuhao Xing", "Fudan University"], ["Jiawei Hong", "Fudan University"], ["Keyu Chen", "Fudan University"], ["Xiaoran Liu", "Fudan University"], ["Yuqing Yang", "Fudan University"], ["Hong...
Kai Lv, Shuo Zhang, Tianle Gu, Shuhao Xing, Jiawei Hong, Keyu Chen, Xiaoran Liu, Yuqing Yang, Honglin Guo, Tengxiao Liu, Yu Sun, Qipeng Guo, Hang Yan, Xipeng Qiu. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023.
false
3
The paper addresses algorithmic efficiency through collaborative distillation to reduce training cost, but does not detail architectural innovations, data processing selection, or hardware co-design. This relates to the research question primarily in model training efficiency via algorithmic methods, with limited cover...
{"algorithmic_efficiency": "Focus on efficient training via collaborative distillation", "architectural_design": "No novel architecture proposed; uses existing LLM structure", "data_processing_selection": "No explicit data selection or reduction strategy mentioned", "hardware_optimization": "No co-design with hardware ...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.012824637815356255, 0.014055171981453896, 0.0135489571839571, 0.021237250417470932, 0.016544681042432785, 0.01004746649414301, -0.005691109225153923, -0.02928038500249386, -0.011319328099489212, 0.008025520481169224, -0.023473311215639114, 0.03562609851360321, 0.0058477651327848434, -0.0134173389524221...
{"umap_x": 1.2128468751907349, "umap_y": 1.6265501976013184}
collie_collaborative_training_of_large_language_models_in_an_efficient_way
https://aclanthology.org/2023.emnlp-demo.48.pdf
Efficient Estimation of Word Representations in Vector Space
2,022
[["Tom\u00e1\u0161 Mikolov", "Brno University of Technology"], ["Kai Chen", "Beijing University of Posts and Telecommunications"], ["Greg S. Corrado", "Google (United States)"], ["Jay B. Dean", "Google (United States)"]]
arXiv (Cornell University)
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. ...
false
3
The paper introduces efficient architectural designs that reduce computational cost while maintaining high accuracy in word representation tasks. It addresses algorithmic efficiency through optimized learning processes but does not involve data selection, hardware co-design, or lightweight model compression techniques,...
{"algorithmic_efficiency": "Focus on low-cost vector learning", "architectural_design": "Novel neural architectures for word embeddings", "data_processing_selection": "Uses large-scale data without selection strategies", "hardware_optimization": "None mentioned"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.008574225008487701, -0.02115696109831333, -0.036923523992300034, 0.004910628776997328, 0.016454622149467468, 0.006229114253073931, 0.009103929623961449, -0.01153800543397665, 0.0067582870833575726, 0.0319916270673275, -0.013759613037109375, 0.001152054755948484, -0.01327128428965807, 0.0097100222483277...
{"umap_x": 1.4145066738128662, "umap_y": 3.0212674140930176}
efficient_estimation_of_word_representations_in_vector_space
https://arxiv.org/pdf/1301.3781
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2,022
[["Mohammad Shoeybi", "Nvidia (United Kingdom)"], ["Mostofa Patwary", ""], ["Raul Puri", ""], ["Patrick LeGresley", ""], ["Jared Casper", ""], ["Bryan Catanzaro", ""]]
arXiv (Cornell University)
Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In this work, we present our techniques for training very large transforme...
false
3
The paper addresses algorithmic efficiency through model parallelism to scale training across GPUs, improves architectural design via effective layer normalization in large transformers, but does not cover data processing selection or hardware co-design. It contributes to efficient large-scale model training, directly ...
{"algorithmic_efficiency": "Focus on model parallelism for training efficiency", "architectural_design": "Utilizes large transformer architecture with optimized layer norms", "data_processing_selection": "None described; no data selection or preprocessing strategies", "hardware_optimization": "None explicitly discussed...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "lora_lowrank_adaptation_of_large_language_models", "dfx_a_lowlatency_multifpga_appliance_for_accelerating_transformerbased_text_generation", "sequence_parallelism_long_sequence_training_from_system_perspective", "alphatuning_qu...
{"embedding": [0.04100388288497925, 0.03935715928673744, -0.004435235168784857, 0.009641114622354507, 0.029430938884615898, -0.012583358213305473, -0.00019652086484711617, -0.028044404461979866, 0.0007970976294018328, -0.02201877534389496, -0.018556399270892143, 0.028289644047617912, 0.008885939605534077, -0.0212425254...
{"umap_x": 3.242035150527954, "umap_y": 4.424654483795166}
megatronlm_training_multibillion_parameter_language_models_using_model_parallelism
https://arxiv.org/pdf/1909.08053
Efficient Large Language Models: A Survey
2,023
[["Zhongwei Wan", ""], ["Xin Wang", ""], ["Liu Che", ""], ["Samiul Alam", ""], ["Yu Zheng", ""], ["Zhongnan Qu", ""], ["Shen Yan", ""], ["Yi Zhu", ""], ["Quanlu Zhang", ""], ["Mosharaf Chowdhury", ""], ["M. Zhang", ""]]
arXiv (Cornell University)
Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding and language generation, and thus have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting th...
false
4
The abstract addresses algorithmic efficiency through model compression techniques, architectural design via lightweight and modular LLMs, data processing through data-centric strategies, and hardware optimization via framework co-design. This paper provides a comprehensive and structured overview of efficiency approac...
{"algorithmic_efficiency": "Covers pruning, quantization, distillation", "architectural_design": "Includes lightweight transformer variants and modular designs", "data_processing_selection": "Addresses data-centric strategies like active learning and distillation", "hardware_optimization": "Mentions framework-centric c...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "efficient_llms_training_and_inference_an_introduction"], "citations": []}
{"embedding": [0.04026489332318306, 0.028331121429800987, -0.018007345497608185, 0.005545796360820532, 0.021036235615611076, 0.018768614158034325, -0.01289068628102541, -0.019707852974534035, -0.004611155018210411, 0.0060470509342849255, -0.023842737078666687, -0.009070123545825481, -0.0036061182618141174, -0.042594231...
{"umap_x": 2.6283323764801025, "umap_y": 4.351630687713623}
efficient_large_language_models_a_survey
https://arxiv.org/pdf/2312.03863
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
2,023
[["Mengzhou Xia", ""], ["Tianyu Gao", ""], ["Zhiyuan Zeng", ""], ["Danqi Chen", ""]]
arXiv (Cornell University)
The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged moderate-sized large language models (LLMs) highlights the potential of building smaller yet powerful LLMs. Regardless, the cost of training such models from scratch on trillions of tokens remains high. In this work, we study structured prunin...
false
4
The paper addresses algorithmic efficiency through structured pruning, reducing model size and training compute by 97%, while preserving performance. It contributes directly to making LLMs more accessible by lowering training costs, aligning with the goal of efficiency for deployers and trainers, though it does not inv...
{"algorithmic_efficiency": "Structured pruning reduces model size and compute", "architectural_design": "Maintains core architecture with reduced layers and dimensions", "data_processing_selection": "None explicitly addressed", "hardware_optimization": "None explicitly addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.03061865270137787, -0.028943132609128952, 0.010935727506875992, 0.011735375970602036, 0.004804610274732113, -0.006368751637637615, 0.012681609019637108, -0.04498782008886337, -0.005487318616360426, 0.0019806839991360903, -0.047109220176935196, 0.009638212621212006, -0.023738546296954155, -0.02018083259...
{"umap_x": 3.5552000999450684, "umap_y": 1.1378127336502075}
sheared_llama_accelerating_language_model_pretraining_via_structured_pruning
https://arxiv.org/pdf/2310.06694
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
2,023
[["Elias Frantar", ""], ["Dan Alistarh", ""]]
arXiv (Cornell University)
Mixture-of-Experts (MoE) architectures offer a general solution to the high inference costs of large language models (LLMs) via sparse routing, bringing faster and more accurate models, at the cost of massive parameter counts. For example, the SwitchTransformer-c2048 model has 1.6 trillion parameters, requiring 3.2TB o...
false
5
The paper addresses algorithmic efficiency through sub-1-bit parameter compression, leverages MoE architectural design for sparse routing, and includes hardware optimization via custom GPU kernels. It directly relates to making AI more accessible by enabling trillion-parameter model deployment on commodity hardware wit...
{"algorithmic_efficiency": "Sub-1-bit parameter compression via QMoE", "architectural_design": "MoE architecture with sparse routing and custom kernels", "data_processing_selection": "NONE", "hardware_optimization": "Co-designed GPU kernels for efficient end-to-end inference"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [8.466185136057902e-06, 0.0018028955673798919, -0.022908544167876244, 0.03432544320821762, 0.01217771414667368, 0.0066304877400398254, 0.02906147576868534, -0.0162091925740242, -0.0078027150593698025, 0.01061217114329338, -0.022028297185897827, 0.0029864744283258915, -0.0020897819194942713, -0.02041719295...
{"umap_x": 3.784914970397949, "umap_y": 5.1092095375061035}
qmoe_practical_sub1bit_compression_of_trillionparameter_models
https://arxiv.org/pdf/2310.16795
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
2,023
[["Young Jin Kim", ""], ["Rawn Henry", ""], ["Raffy Fahim", ""], ["Hany Hassan Awadalla", ""]]
arXiv (Cornell University)
Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements. Furthermore, the latest generative models suffer from high inference costs caused by the memory bandwidth bottleneck in the aut...
false
5
The paper focuses on algorithmic efficiency through fine-grained weight-only quantization, applicable to various architectures without redesign. It enhances inference speed and reduces memory bandwidth bottlenecks via hardware-aware matrix computation, directly addressing efficiency in deployment. This aligns strongly ...
{"algorithmic_efficiency": "Weight-only quantization reduces inference cost", "architectural_design": "Applies to dense and MoE models without architecture changes", "data_processing_selection": "Not addressed", "hardware_optimization": "Uses GPU GEMMs for efficient on-the-fly matrix operations"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.01722078025341034, 0.010604764334857464, -0.009079094976186752, -0.0033601915929466486, 0.050662726163864136, 0.03012116439640522, -0.00540671031922102, -0.006922718603163958, -0.009112469851970673, 0.008411252871155739, -0.029299361631274223, -0.01347169280052185, -0.011385845951735973, -0.05799345672...
{"umap_x": 4.403202533721924, "umap_y": 0.2184237241744995}
finequant_unlocking_efficiency_with_finegrained_weightonly_quantization_for_llms
https://arxiv.org/pdf/2308.09723
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
2,022
[["Se Jung Kwon", ""], ["Jeonghoon Kim", ""], ["Jeongin Bae", ""], ["Kang Min Yoo", ""], ["Jin-Hwa Kim", ""], ["Baeseong Park", ""], ["Byeongwook Kim", ""], ["Jung-Woo Ha", ""], ["Nako Sung", ""], ["Dongsoo Lee", ""]]
arXiv (Cornell University)
There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing mem...
true
4
The paper addresses algorithmic efficiency through quantization-aware adaptation and parameter reduction, enabling significant compression without full model retraining. It directly relates to the research question by focusing on model efficiency and parameter reduction, particularly in the context of large language mo...
{"algorithmic_efficiency": "Uses quantization and pruning for efficiency", "architectural_design": "No novel architecture proposed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["alphatuning_quantizationaware_parameterefficient_adaptation_of_largescale_pretrained_language_models"]}
{"embedding": [0.013053199276328087, 0.033066511154174805, -0.0009465421899221838, -0.007559916470199823, 0.005016125272959471, -0.009896357543766499, 0.02773279882967472, 0.016582870855927467, -0.0056730047799646854, 0.004793745931237936, -0.026011807844042778, -0.004745102021843195, 0.0018868736224249005, -0.05428681...
{"umap_x": 3.883601427078247, "umap_y": 0.38775232434272766}
alphatuning_quantizationaware_parameterefficient_adaptation_of_largescale_pretrained_language_models
https://arxiv.org/pdf/2210.03858
Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects
2,023
[["Muhammad Usman Hadi", "University of Ulster"], ["qasem al tashi", "The University of Texas MD Anderson Cancer Center"], ["Rizwan Qureshi", "The University of Texas MD Anderson Cancer Center"], ["Abbas Shah", "Mehran University of Engineering and Technology"], ["Amgad Muneer", "The University of Texas MD Anderson Can...
<p>Within the vast expanse of computerized language processing, a revolutionary entity known as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to comprehend intricate linguistic patterns and conjure coherent and contextually fitting responses. Large language models (LLMs) are a t...
false
3
The abstract covers LLMs' architecture and training methods broadly but does not address algorithmic efficiency, data processing, or hardware optimization specifically. It provides general context on LLMs without detailing techniques for making AI efficient, accessible, or hardware-aware, thus lacking direct relevance ...
{"algorithmic_efficiency": "Mentions training methods but no specific efficiency techniques", "architectural_design": "Discusses GPT architecture as foundational but no novel optimizations", "data_processing_selection": "No mention of data selection, synthesis, or preprocessing strategies", "hardware_optimization": "Re...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["easy_and_efficient_transformer_scalable_inference_solution_for_large_nlp_model"]}
{"embedding": [0.005666312761604786, 0.010979367420077324, -0.004834004212170839, 0.02630424313247204, 0.01625974290072918, 0.0013138999929651618, -0.005056085530668497, -0.030348146334290504, -0.024305226281285286, 0.004027986899018288, -0.007642025593668222, -0.006406105123460293, 0.028320031240582466, -0.02684817090...
{"umap_x": -0.16001614928245544, "umap_y": 2.863895893096924}
large_language_models_a_comprehensive_survey_of_its_applications_challenges_limitations_and_future_prospects
https://www.techrxiv.org/articles/preprint/A_Survey_on_Large_Language_Models_Applications_Challenges_Limitations_and_Practical_Usage/23589741/3/files/42375240.pdf
Sequence Parallelism: Long Sequence Training from System Perspective
2,023
[["Shenggui Li", "National University of Singapore"], ["Fuzhao Xue", "National University of Singapore"], ["Chaitanya Baranwal", "National University of Singapore"], ["Yongbin Li", "National University of Singapore"], ["Yang You", "National University of Singapore"]]
Transformer achieves promising results on various tasks. However, self-attention suffers from quadratic memory requirements with respect to the sequence length. Existing work focuses on reducing time and space complexity from an algorithm perspective. In this work, we propose sequence parallelism, a memory-efficient pa...
true
4
The paper addresses algorithmic efficiency through linear-complexity attention and system-level sequence parallelism, enabling long sequences without memory explosion. It improves scalability across hardware by distributing sequence chunks across GPUs, making it highly relevant to AI efficiency and hardware-aware train...
{"algorithmic_efficiency": "Linear-complexity attention via Ring Self-Attention", "architectural_design": "System-level parallelism enabling long-sequence training", "data_processing_selection": "None", "hardware_optimization": "GPU-based parallelism with efficient distributed computation"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["sequence_parallelism_long_sequence_training_from_system_perspective"]}
{"embedding": [0.009661182761192322, 0.039885539561510086, 0.0022265783045440912, 0.012173784896731377, 0.012189852073788643, 0.03944573923945427, 0.013937323354184628, -0.0251475740224123, -0.0126560153439641, 0.005944029428064823, -0.013178933411836624, 0.004162903875112534, 0.010372107848525047, 0.008103701286017895...
{"umap_x": 2.2824063301086426, "umap_y": 4.8038716316223145}
sequence_parallelism_long_sequence_training_from_system_perspective
https://aclanthology.org/2023.acl-long.134.pdf
Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models
2,024
[["Liang Li", ""], ["Qingyuan Li", ""], ["Bo Zhang", ""], ["Xiangxiang Chu", ""]]
Proceedings of the AAAI Conference on Artificial Intelligence
As the size of large language models (LLMs) continues to grow, model compression without sacrificing accuracy has become a crucial challenge for deployment. While some quantization methods, such as GPTQ, have made progress in achieving acceptable 4-bit weight-only quantization, attempts at lower-bit quantization often ...
true
4
The paper addresses algorithmic efficiency through low-bit quantization via norm tweaking, improving model compression without accuracy loss. It directly relates to the research question by demonstrating efficient LLM compression using advanced algorithmic techniques, making it highly relevant to AI efficiency and acce...
{"algorithmic_efficiency": "Introduces norm tweaking for low-bit quantization", "architectural_design": "Focuses on quantization-aware adjustments, not architecture redesign", "data_processing_selection": "No data selection or preprocessing strategies discussed", "hardware_optimization": "No co-design with hardware or ...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["norm_tweaking_highperformance_lowbit_quantization_of_large_language_models"]}
{"embedding": [0.03657093271613121, 0.02959270589053631, -0.022045500576496124, 0.01814325712621212, 0.034069422632455826, 0.0369650162756443, 0.027276864275336266, -0.01329090166836977, -0.01629781164228916, 0.025308843702077866, -0.036803025752305984, 0.009830676950514317, -0.006988047156482935, -0.05561862885951996,...
{"umap_x": 4.645567893981934, "umap_y": 0.3154524862766266}
norm_tweaking_highperformance_lowbit_quantization_of_large_language_models
https://ojs.aaai.org/index.php/AAAI/article/download/29815/31414
Easy and Efficient Transformer: Scalable Inference Solution For Large NLP Model
2,022
[["Gongzheng Li", "NetEase (China)"], ["Yadong Xi", "NetEase (China)"], ["Jingzhen Ding", "NetEase (China)"], ["Duan Wang", "NetEase (China)"], ["Ziyang Luo", "Hong Kong Baptist University"], ["Rongsheng Zhang", "NetEase (China)"], ["Bai Liu", "NetEase (China)"], ["Changjie Fan", "NetEase (China)"], ["Xiaoxi Mao", "Net...
Gongzheng Li, Yadong Xi, Jingzhen Ding, Duan Wang, Ziyang Luo, Rongsheng Zhang, Bai Liu, Changjie Fan, Xiaoxi Mao, Zeng Zhao. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track. 2022.
true
4
The paper focuses on an efficient transformer architecture that reduces computational cost through pruning and quantization, enhancing algorithmic efficiency and inference speed. This directly addresses the research question by improving AI efficiency for deployers, especially in large-scale NLP models, though it does ...
{"algorithmic_efficiency": "Lightweight inference via quantization and pruning", "architectural_design": "Efficient transformer architecture for scalable inference", "data_processing_selection": "NONE", "hardware_optimization": "NONE"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [-0.0017156979301944375, 0.009493980556726456, -0.00873572938144207, 0.028826255351305008, 0.017881128937005997, 0.005072024650871754, -0.0022384002804756165, -0.03023265302181244, -0.009327973239123821, -0.009714856743812561, -0.008558373898267746, 0.0018664973322302103, -0.005373497027903795, -0.0183749...
{"umap_x": 1.4800645112991333, "umap_y": 4.305948734283447}
easy_and_efficient_transformer_scalable_inference_solution_for_large_nlp_model
https://aclanthology.org/2022.naacl-industry.8.pdf
LLaMA: Open and Efficient Foundation Language Models
2,023
[["Hugo Touvron", ""], ["Thibaut Lavril", ""], ["Gautier Izacard", ""], ["Xavier Martinet", ""], ["Marie-Anne Lachaux", ""], ["Timoth\u00e9e Lacroix", ""], ["Baptiste Rozi\u00e8re", ""], ["Naman Goyal", ""], ["Eric Hambro", ""], ["Faisal Azhar", ""], ["Aurelien Rodriguez", ""], ["Armand Joulin", ""], ["\u00c9douard Gra...
arXiv (Cornell University)
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In partic...
false
2
The abstract focuses on model scale and training data accessibility, not algorithmic compression, architectural efficiency, data reduction, or hardware co-design. While LLaMA enables broader access to AI models, it does not address efficiency improvements for trainers or deployers in the specified technical aspects.
{"algorithmic_efficiency": "Not explicitly addressed", "architectural_design": "Not explicitly addressed", "data_processing_selection": "No discussion of data selection or preprocessing", "hardware_optimization": "Not discussed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "a_survey_on_model_compression_for_large_language_models", "norm_tweaking_highperformance_lowbit_quantization_of_large_language_models"], "citations": []}
{"embedding": [0.028069112449884415, 0.003256347496062517, -0.0038937567733228207, 0.012568417005240917, 0.023928994312882423, -0.01799316145479679, -0.02638695202767849, -0.04716646298766136, -0.012254024855792522, -0.0010661542182788253, -0.03463950753211975, -0.02418670989573002, -0.005406314507126808, -0.0409579016...
{"umap_x": 0.8774848580360413, "umap_y": 2.4588992595672607}
llama_open_and_efficient_foundation_language_models
https://arxiv.org/pdf/2302.13971
LoRA: Low-Rank Adaptation of Large Language Models
2,022
[["J. Edward Hu", ""], ["Yelong Shen", ""], ["Phillip Wallis", ""], ["Zeyuan Allen-Zhu", ""], ["Yuanzhi Li", ""], ["Shean Wang", ""], ["Weizhu Chen", ""]]
arXiv (Cornell University)
An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying indepen...
true
4
LoRA improves algorithmic efficiency by drastically reducing trainable parameters and GPU memory usage through low-rank adaptation, maintaining model performance without altering architecture or data processing. This directly supports AI efficiency for trainers and deployers by enabling faster, cheaper fine-tuning with...
{"algorithmic_efficiency": "Low-rank matrix decomposition reduces trainable parameters", "architectural_design": "Applies to standard Transformer layers without architectural change", "data_processing_selection": "Not directly addressed", "hardware_optimization": "Not directly addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "alphatuning_quantizationaware_parameterefficient_adaptation_of_largescale_pretrained_language_models"], "citations": []}
{"embedding": [0.017740044742822647, -0.003109108190983534, -0.008134584873914719, 0.010051787830889225, 0.02310682088136673, 0.011644699610769749, 0.01772686466574669, -0.027575552463531494, -0.0006403934676200151, 0.018991440534591675, -0.03545672819018364, 0.025072164833545685, -0.029526976868510246, -0.037177309393...
{"umap_x": 3.503511905670166, "umap_y": 1.3762201070785522}
lora_lowrank_adaptation_of_large_language_models
https://arxiv.org/pdf/2106.09685
Scaling Laws for Neural Language Models
2,022
[["Jared Kaplan", "Johns Hopkins University"], ["Sam McCandlish", "OpenAI (United States)"], ["Tom Henighan", "OpenAI (United States)"], ["T. B. Brown", "OpenAI (United States)"], ["Benjamin Chess", "OpenAI (United States)"], ["Rewon Child", "OpenAI (United States)"], ["Scott Gray", "OpenAI (United States)"], ["Alec Ra...
arXiv (Cornell University)
We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth h...
false
3
The paper identifies efficient scaling in model and data size, showing large models achieve better sample efficiency with less data. While it touches on computational efficiency and training speed, it does not address algorithmic compression, hardware co-design, or data preprocessing strategies directly, making it rele...
{"algorithmic_efficiency": "Models scale efficiently with compute and data", "architectural_design": "Architecture width/depth have minimal impact", "data_processing_selection": "Larger models are more sample-efficient", "hardware_optimization": "None"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "easy_and_efficient_transformer_scalable_inference_solution_for_large_nlp_model", "a_survey_of_text_classification_with_transformers_how_wide_how_large_how_long_how_accurate_how_expensive_how_safe", "a_survey_on_model_compressio...
{"embedding": [-0.00876977015286684, -0.0039003777783364058, -0.013616055250167847, 0.01659400202333927, 0.0077790203504264355, 7.5198108788754325e-06, 0.02720728889107704, -0.04012727737426758, -0.018783556297421455, -0.005879230331629515, -0.03345382213592529, -0.0006703215767629445, 0.009391481056809425, -0.02575855...
{"umap_x": 1.9650073051452637, "umap_y": 2.672745943069458}
scaling_laws_for_neural_language_models
https://arxiv.org/pdf/2001.08361
A Survey of Large Language Models
2,023
[["Wayne Xin Zhao", ""], ["Kun Zhou", ""], ["Junyi Li", ""], ["Tianyi Tang", ""], ["Xiaolei Wang", ""], ["Yupeng Hou", ""], ["Yingqian Min", ""], ["Beichen Zhang", ""], ["Junjie Zhang", ""], ["Zican Dong", ""], ["Yifan Du", ""], ["Yang Chen", ""], ["Yushuo Chen", ""], ["Zhipeng Chen", ""], ["Jinhao Jiang", ""], ["Ruiya...
arXiv (Cornell University)
Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in...
false
3
The abstract discusses algorithmic efficiency through model scaling, architectural design via Transformer-based models, but does not address data processing selection or hardware optimization. While it touches on efficiency gains from scale, it lacks specifics on compression, pruning, or hardware-aware design. This pap...
{"algorithmic_efficiency": "Focuses on scaling and model size improvements", "architectural_design": "Highlights Transformer architecture as core design", "data_processing_selection": "None explicitly mentioned", "hardware_optimization": "None explicitly mentioned"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "a_survey_on_model_compression_for_large_language_models"], "citations": []}
{"embedding": [0.02106095850467682, 0.02729201503098011, -0.028415847569704056, 0.019447773694992065, 0.009253825061023235, 0.014826541766524315, 0.009625006467103958, -0.034373920410871506, -0.00616926746442914, 0.005304890684783459, -0.003205004846677184, -0.010054223239421844, 0.02535950019955635, -0.035909790545701...
{"umap_x": 0.13699249923229218, "umap_y": 2.821420192718506}
a_survey_of_large_language_models
https://arxiv.org/pdf/2303.18223
Emergent Abilities of Large Language Models
2,022
[["Jason Lee", ""], ["Yi Tay", ""], ["Rishi Bommasani", ""], ["Colin Raffel", ""], ["Barret Zoph", ""], ["Sebastian Borgeaud", ""], ["Dani Yogatama", ""], ["Maarten Bosma", ""], ["Denny Zhou", ""], ["Donald Metzler", ""], ["Ed H.", ""], ["Tatsunori Hashimoto", ""], ["Oriol Vinyals", ""], ["Percy Liang", ""], ["Jeff Dea...
arXiv (Cornell University)
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in ...
false
1
The abstract does not discuss algorithmic efficiency, architectural design, data processing/selection, or hardware optimization. It focuses solely on emergent abilities in scale, which is unrelated to efficiency or accessibility improvements for trainers or deployers.
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.03387472778558731, -0.005439681001007557, -0.023112742230296135, 0.0018034933600574732, -0.001494889729656279, -0.01082603819668293, 0.015019020065665245, -0.029836997389793396, -0.004846761468797922, 0.008462167344987392, -0.05004749074578285, -0.007484156172722578, 0.02076156996190548, -0.02906580455...
{"umap_x": 0.004051003139466047, "umap_y": 1.9048144817352295}
emergent_abilities_of_large_language_models
https://arxiv.org/pdf/2206.07682
Mixed Precision Training
2,022
[["Paulius Micikevicius", ""], ["Sharan Narang", ""], ["Jonah Alben", ""], ["Gregory Diamos", ""], ["Erich Elsen", ""], ["David Garc\u00eda", ""], ["Boris Ginsburg", ""], ["Michael Houston", ""], ["Oleksii Kuchaiev", ""], ["Ganesh Venkatesh", ""], ["Hao Wu", ""]]
arXiv (Cornell University)
Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using hal...
true
4
The paper addresses algorithmic efficiency through half-precision training, reducing memory and compute. It does not propose new architectural designs, data processing strategies, or hardware co-design. Overall, it directly contributes to AI efficiency via computational and memory reduction, especially relevant to trai...
{"algorithmic_efficiency": "Uses half-precision to reduce memory and compute", "architectural_design": "No novel architecture proposed; general model applicability", "data_processing_selection": "Not addressed in the paper", "hardware_optimization": "References future speedup in half-precision hardware"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "efficient_llms_training_and_inference_an_introduction"], "citations": ["mixed_precision_training"]}
{"embedding": [0.007760616019368172, 0.043927550315856934, 0.005336333531886339, -0.00993585865944624, 0.03349929302930832, -0.0020776945166289806, -0.021830303594470024, -0.021009765565395355, -0.029051391407847404, -0.005508284084498882, -0.02093128301203251, 0.0035580124240368605, 0.0012813479406759143, -0.045919828...
{"umap_x": 4.885804653167725, "umap_y": 1.2224352359771729}
mixed_precision_training
https://arxiv.org/pdf/1710.03740
Training Deep Nets with Sublinear Memory Cost
2,022
[["Tianqi Chen", ""], ["Bing Xu", ""], ["Chiyuan Zhang", ""], ["Carlos Guestrin", ""]]
arXiv (Cornell University)
We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O(sqrt(n)) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch. As many of the state-of-the-art models hit the upper ...
true
4
The paper addresses algorithmic efficiency by reducing memory cost with minimal extra computation, using in-place operations and graph analysis. This directly supports training efficiency for deep networks, making it highly relevant to the research question on AI efficiency and accessible training, especially for large...
{"algorithmic_efficiency": "Memory-efficient training via computation trade-off", "architectural_design": "No architectural changes proposed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": ["training_deep_nets_with_sublinear_memory_cost"]}
{"embedding": [-0.010268733836710453, 0.016851207241415977, -0.010965274646878242, 0.0021358535159379244, 0.035594262182712555, 0.02153593860566616, -0.012308535166084766, -0.050091952085494995, 0.0007843205239623785, 0.009721996262669563, -0.003055001376196742, -0.005251048598438501, -0.007409706711769104, -0.01579955...
{"umap_x": 4.336348056793213, "umap_y": 4.202168941497803}
training_deep_nets_with_sublinear_memory_cost
https://arxiv.org/pdf/1604.06174
GLM-130B: An Open Bilingual Pre-trained Model
2,022
[["Aohan Zeng", ""], ["Xiao Liu", ""], ["Zhengxiao Du", ""], ["Zihan Wang", ""], ["Hanyu Lai", ""], ["Ming Ding", ""], ["Zhuoyi Yang", ""], ["Yifan Xu", ""], ["Wendi Zheng", ""], ["Xiao Xia", ""], ["Weng Lam Tam", ""], ["Zixuan Ma", ""], ["Yufei Xue", ""], ["Jidong Zhai", ""], ["Wenguang Chen", ""], ["Peng Zhang", ""],...
arXiv (Cornell University)
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous ...
false
4
The paper addresses algorithmic efficiency through INT4 quantization without performance loss and highlights hardware optimization by enabling inference on accessible GPUs. It covers architectural design via a large-scale bilingual pre-trained model and touches on training stability as part of efficiency efforts, thoug...
{"algorithmic_efficiency": "Achieves INT4 quantization with minimal performance loss", "architectural_design": "Bilingual pre-trained architecture optimized for scalability and performance", "data_processing_selection": "None explicitly described", "hardware_optimization": "Enables efficient inference on affordable con...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [-0.0010111982701346278, -0.01597149856388569, -0.0017243103357031941, -0.029269926249980927, 0.02009303867816925, -0.007460724096745253, 0.006205474957823753, -0.018780138343572617, 0.004395610652863979, -0.012588879093527794, -0.03267643600702286, 0.033414945006370544, -0.003024586709216237, -0.02261583...
{"umap_x": 1.1210700273513794, "umap_y": 2.7202999591827393}
glm130b_an_open_bilingual_pretrained_model
https://arxiv.org/pdf/2210.02414
Galactica: A Large Language Model for Science
2,022
[["Ross Taylor", ""], ["Marcin Kardas", ""], ["Guillem Cucurull", ""], ["Thomas Scialom", ""], ["Anthony S. Hartshorn", ""], ["Elvis Saravia", ""], ["Andrew M. Poulton", ""], ["Viktor Kerkez", ""], ["Robert Stojnic", ""]]
arXiv (Cornell University)
Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge ...
false
2
The paper focuses on model performance in scientific reasoning using a large corpus, without addressing algorithmic efficiency, architectural design, data selection, or hardware co-design. While it highlights improved efficiency in reasoning tasks, these gains are not tied to compression, pruning, low-precision methods...
{"algorithmic_efficiency": "Not addressed directly", "architectural_design": "Not specified or evaluated", "data_processing_selection": "Large corpus used; no data selection/synthesis strategies detailed", "hardware_optimization": "Not mentioned"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.0012554387794807553, -0.0030132776591926813, 0.007747704163193703, -0.00015637167962267995, -0.010692249983549118, 0.01825151965022087, -0.031427010893821716, -0.022593168541789055, 0.03717953339219093, -0.008477279916405678, -0.018056418746709824, 0.011033998802304268, 0.03465652838349342, -0.04985615...
{"umap_x": 0.24775800108909607, "umap_y": 3.0810086727142334}
galactica_a_large_language_model_for_science
https://arxiv.org/pdf/2211.09085
Instruction Tuning with GPT-4
2,023
[["Baolin Peng", ""], ["Chunyuan Li", ""], ["Pengcheng He", ""], ["Michel Galley", ""], ["Jianfeng Gao", ""]]
arXiv (Cornell University)
Prior work has shown that finetuning large language models (LLMs) using machine-generated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed. In this paper, we present the first attempt to use GPT-4 to generate instructi...
false
3
The paper addresses data processing through synthetic instruction data generation using GPT-4, improving efficiency in training via reduced human annotation. While it contributes to data selection and algorithmic efficiency in training pipelines, it does not cover architectural design, hardware co-design, or computatio...
{"algorithmic_efficiency": "GPT-4 generates instruction data for efficient fine-tuning", "architectural_design": "No novel architecture proposed; uses existing LLaMA models", "data_processing_selection": "Uses GPT-4 to generate large-scale instruction data via synthesis", "hardware_optimization": "Not addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.02362767606973648, 0.004575033206492662, -0.026395902037620544, -0.006383663974702358, 0.030371027067303658, -0.009539224207401276, -0.011694496497511864, -0.021086435765028, -0.01689642295241356, 0.0018970653181895614, -0.04817136004567146, -0.011033416725695133, 0.0221756249666214, -0.012581203132867...
{"umap_x": 1.9818570613861084, "umap_y": 1.1555919647216797}
instruction_tuning_with_gpt4
https://arxiv.org/pdf/2304.03277
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
2,022
[["Elias Frantar", ""], ["Saleh Ashkboos", ""], ["Torsten Hoefler", ""], ["Dan Alistarh", ""]]
arXiv (Cornell University)
Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models m...
false
4
The paper addresses algorithmic efficiency through high-accuracy post-training quantization, enabling significant computational savings without architectural changes. It enhances accessibility by allowing large models to run on single GPUs, improving deployment feasibility on varied hardware, directly supporting the re...
{"algorithmic_efficiency": "Post-training quantization reduces bitwidth and computational cost", "architectural_design": "Applies to existing GPT architecture without redesign", "data_processing_selection": "Not addressed", "hardware_optimization": "Enables efficient inference on GPUs, supports cost-effective hardware"...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "efficient_llms_training_and_inference_an_introduction", "norm_tweaking_highperformance_lowbit_quantization_of_large_language_models"], "citations": []}
{"embedding": [0.0035308427177369595, 0.019670896232128143, 0.0028999322094023228, 0.02818254381418228, 0.03491412103176117, -0.016553733497858047, 0.0038949630688875914, 0.0047984798438847065, 0.005347609054297209, -0.007755343336611986, -0.03582578897476196, 0.01843646727502346, -0.007554616779088974, -0.023336499929...
{"umap_x": 4.467278003692627, "umap_y": 0.0201785396784544}
gptq_accurate_posttraining_quantization_for_generative_pretrained_transformers
https://arxiv.org/pdf/2210.17323
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
2,023
[["Ying Sheng", ""], ["Lianmin Zheng", ""], ["Binhang Yuan", ""], ["Zhuohan Li", ""], ["Max Ryabinin", ""], ["Daniel Y. Fu", ""], ["Zhiqiang Xie", ""], ["Beidi Chen", ""], ["Clark Barrett", ""], ["Joseph E. Gonzalez", ""], ["Percy Liang", ""], ["Christopher R\u00e9", ""], ["Ion Stoica", ""], ["Ce Zhang", ""]]
arXiv (Cornell University)
The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited resource...
false
4
FlexGen improves algorithmic efficiency via 4-bit quantization and tensor compression, enables hardware optimization through dynamic offloading across GPU, CPU, and disk, and operates under constrained resources without changing model architecture or data selection. This paper directly addresses efficiency in inference...
{"algorithmic_efficiency": "4-bit quantization and tensor optimization", "architectural_design": "no novel architecture proposed", "data_processing_selection": "none", "hardware_optimization": "GPU-CPU-disk offloading and resource sharing"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review", "norm_tweaking_highperformance_lowbit_quantization_of_large_language_models"], "citations": []}
{"embedding": [0.02392302267253399, 0.02577134780585766, 0.0018325812416151166, 0.02785620652139187, 0.03660789877176285, 0.019593846052885056, -0.007690931670367718, -0.014957224950194359, -0.00576420035213232, 0.014034590683877468, -0.03839999437332153, 0.014644519425928593, -0.012301272712647915, -0.0157190896570682...
{"umap_x": 3.156960964202881, "umap_y": 5.081453800201416}
flexgen_highthroughput_generative_inference_of_large_language_models_with_a_single_gpu
https://arxiv.org/pdf/2303.06865
Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models
2,024
[["Guangji Bai", ""], ["Zheng Chai", ""], ["Ling Chen", ""], ["Shiyu Wang", ""], ["Jiaying Lu", ""], ["Nan Zhang", ""], ["Tingwei Shi", ""], ["Ziyang Yu", ""], ["Mengdan Zhu", ""], ["Yifei Zhang", ""], ["Carl Yang", ""], ["Yue Cheng", ""], ["Liang Zhao", ""]]
arXiv (Cornell University)
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, e...
false
4
The abstract addresses algorithmic efficiency through compression and pruning, architectural design via optimized model structures, data processing via selection and preprocessing, and hardware optimization through system-level co-design. It broadly aligns with the research question by covering core aspects of AI effic...
{"algorithmic_efficiency": "Covers compression, pruning, quantization techniques", "architectural_design": "Reviews efficient model architectures for speed and memory", "data_processing_selection": "Discusses selection and preprocessing strategies in training", "hardware_optimization": "Mentions co-design with hardware...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.011524825356900692, 0.0014836908085271716, -0.014140969142317772, 0.01178088691085577, 0.018526863306760788, 0.026597237214446068, -0.022126303985714912, -0.013451525941491127, 0.015063799917697906, 0.01785143092274666, -0.03228159248828888, 0.005612108390778303, -0.012267298996448517, -0.0183880478143...
{"umap_x": 2.682482957839966, "umap_y": 4.318592548370361}
beyond_efficiency_a_systematic_survey_of_resourceefficient_large_language_models
https://arxiv.org/pdf/2401.00625
Accelerating LLM Inference with Staged Speculative Decoding
2,023
[["Benjamin Spector", ""], ["Chris R\u00e9", ""]]
arXiv (Cornell University)
Recent advances with large language models (LLM) illustrate their diverse capabilities. We propose a novel algorithm, staged speculative decoding, to accelerate LLM inference in small-batch, on-device scenarios. We address the low arithmetic intensity of small-batch inference by improving upon previous work in speculat...
false
4
The paper addresses algorithmic efficiency through staged speculative decoding, improving inference speed in small-batch on-device settings. It does not introduce new architectural designs, data selection strategies, or hardware co-design; relevance is limited to inference efficiency and edge deployment context.
{"algorithmic_efficiency": "Staged speculative decoding reduces per-batch latency", "architectural_design": "No new architecture proposed; algorithmic improvement only", "data_processing_selection": "Not addressed", "hardware_optimization": "Focused on on-device inference, relevant to edge deployment"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.004690984729677439, -0.00908802729099989, 0.0012481240555644035, 0.011328754015266895, 0.049908384680747986, 0.025987612083554268, 0.01085615437477827, -0.04266718775033951, 0.02373083308339119, 0.006975327618420124, -0.03988121077418327, -0.0036921226419508457, 0.004101686645299196, -0.003187862923368...
{"umap_x": 2.7821357250213623, "umap_y": 5.090044975280762}
accelerating_llm_inference_with_staged_speculative_decoding
https://arxiv.org/pdf/2308.04623
Fast Inference of Mixture-of-Experts Language Models with Offloading
2,023
[["Artyom Eliseev", ""], ["D. Peter Mazur", ""]]
arXiv (Cornell University)
With the widespread adoption of Large Language Models (LLMs), many deep learning practitioners are looking for strategies of running these models more efficiently. One such strategy is to use sparse Mixture-of-Experts (MoE) - a type of model architectures where only a fraction of model layers are active for any given i...
false
4
The paper addresses algorithmic efficiency through quantization and sparse activation in MoE models, highlights architectural design via sparse expert routing, and advances hardware optimization through offloading to consumer devices. This directly supports the research question by improving AI accessibility and effici...
{"algorithmic_efficiency": "Quantization and sparse activation enable faster inference", "architectural_design": "Sparse MoE architecture leverages active experts for efficiency", "data_processing_selection": "NONE", "hardware_optimization": "Offloading to consumer hardware improves accessibility"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.029105959460139275, 0.039964769035577774, -0.01621079444885254, -0.004242763388901949, 0.03127654641866684, 0.016400039196014404, 0.0004129675216972828, -0.03340347483754158, 0.03482428193092346, -0.002455496694892645, -0.00493277981877327, 0.027154477313160896, 0.012965243309736252, 0.0011931278277188...
{"umap_x": 3.718214750289917, "umap_y": 5.1356706619262695}
fast_inference_of_mixtureofexperts_language_models_with_offloading
https://arxiv.org/pdf/2312.17238
Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey
2,023
[["Lovre Torbarina", ""], ["Tin Ferkovic", ""], ["\u0141ukasz Roguski", ""], ["Velimir Mihel\u010di\u0107", ""], ["Bruno \u0160arlija", ""], ["\u017deljko Kraljevi\u0107", ""]]
arXiv (Cornell University)
The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems to handle these models efficiently, from training to serving them in production. However, training, deploying, and updating multiple models can be complex, costly, and time-c...
false
3
The paper addresses algorithmic efficiency through multi-task learning (MTL) to reduce redundant training. It examines architectural design via transformer-based MTL frameworks optimized for shared representations. Data processing challenges are highlighted in data engineering phases. While hardware optimization is not...
{"algorithmic_efficiency": "Focuses on multi-task training for shared representation efficiency", "architectural_design": "Explores transformer-based MTL architectures for joint learning efficiency", "data_processing_selection": "Discusses data engineering challenges in multi-task settings", "hardware_optimization": "N...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [-0.004151456989347935, 0.00766021478921175, -0.009559391066432, 0.019604580476880074, -0.0017366099637001753, -0.00697768060490489, 0.01309115719050169, -0.031573548913002014, 0.011143923737108707, -0.0228771660476923, 0.012646044604480267, -0.0044882274232804775, 0.015259669162333012, -0.038288235664367...
{"umap_x": 0.4858739972114563, "umap_y": 4.246918201446533}
challenges_and_opportunities_of_using_transformerbased_multitask_learning_in_nlp_through_ml_lifecycle_a_survey
https://arxiv.org/pdf/2308.08234
Multilevel Large Language Models for Everyone
2,023
[["Yuanhao Gong", ""]]
arXiv (Cornell University)
Large language models have made significant progress in the past few years. However, they are either generic {\it or} field specific, splitting the community into different groups. In this paper, we unify these large language models into a larger map, where the generic {\it and} specific models are linked together and ...
false
4
The paper introduces a multilevel architectural design that enhances efficiency through hierarchical model linking, reducing redundancy and improving performance. It supports algorithmic efficiency via localized, privacy-preserving inference and enables hardware optimization by running user-level models on local device...
{"algorithmic_efficiency": "Multilevel structure reduces redundancy", "architectural_design": "Hierarchical models with global/field/user levels", "data_processing_selection": "User input and internet data used for dynamic model linking", "hardware_optimization": "User-level models run locally for privacy and efficienc...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.02340194210410118, 0.002977262483909726, 0.011995568871498108, 0.011049370281398296, 0.016527913510799408, 0.004396013915538788, 0.005121681373566389, -0.025312338024377823, 0.001987339463084936, -0.00883818045258522, -0.016416940838098526, 0.013631931506097317, 0.004597695544362068, -0.031988713890314...
{"umap_x": 0.00014141207793727517, "umap_y": 2.6859841346740723}
multilevel_large_language_models_for_everyone
https://arxiv.org/pdf/2307.13221
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
2,023
[["Hongye Jin", ""], ["Xiaotian Han", ""], ["Jingfeng Yang", ""], ["Zexuan Jiang", ""], ["Chia-Yuan Chang", ""], ["Xia Hu", ""]]
arXiv (Cornell University)
The evolving sophistication and intricacies of Large Language Models (LLMs) yield unprecedented advancements, yet they simultaneously demand considerable computational resources and incur significant costs. To alleviate these challenges, this paper introduces a novel, simple, and effective method named ``\growlength'' ...
false
4
The paper addresses algorithmic efficiency by progressively increasing sequence length to reduce computational cost per token, without altering model architecture or data processing. It improves training efficiency through a simple, resource-aware scheduling strategy, directly relevant to AI efficiency in pretraining. ...
{"algorithmic_efficiency": "Progressive sequence length growth reduces training cost", "architectural_design": "No architectural changes to base model structure", "data_processing_selection": "None; focuses on training length, not data selection", "hardware_optimization": "None; software-only pretraining acceleration"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.006965572014451027, 0.017988016828894615, -0.018366701900959015, -0.0016779806464910507, 0.015169120393693447, -0.006533289328217506, 0.018407372757792473, -0.038902029395103455, 0.0072035291232168674, 0.01112855039536953, -0.05346403643488884, -0.009590966627001762, -0.0007764165056869388, -0.00546367...
{"umap_x": 1.605208158493042, "umap_y": 2.6041340827941895}
growlength_accelerating_llms_pretraining_by_progressively_growing_training_length
https://arxiv.org/pdf/2310.00576
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
2,023
[["Qingyuan Li", ""], ["Yifan Zhang", ""], ["Liang Li", ""], ["Peng Yao", ""], ["Bo Zhang", ""], ["Xiangxiang Chu", ""], ["Yerui Sun", ""], ["Li Du", ""], ["Yuchen Xie", ""]]
arXiv (Cornell University)
In the era of large-scale language models, the substantial parameter size poses significant challenges for deployment. Being a prevalent compression technique, quantization has emerged as the mainstream practice to tackle this issue, which is mainly centered on two recipes W8A8 and W4A16 (i.e. weights and activations i...
false
4
The paper addresses algorithmic efficiency through fine-grained post-training quantization, improving computational efficiency via W4A8 schemes. It directly supports the research question by enhancing AI model efficiency for deployment without requiring retraining or architectural redesign, making it highly relevant to...
{"algorithmic_efficiency": "Proposes W4A8 post-training quantization for LLMs", "architectural_design": "No architectural changes; focuses on quantization post-training", "data_processing_selection": "Not applicable", "hardware_optimization": "Not explicitly addressed"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.02657046541571617, 0.025776835158467293, -0.024764645844697952, 0.015532885678112507, 0.03625849261879921, 0.0070215496234595776, 0.022017624229192734, -0.005888413172215223, -0.003356204368174076, -0.015175385400652885, -0.020809559151530266, -0.0032545137219130993, -0.014427152462303638, -0.033521380...
{"umap_x": 4.385741233825684, "umap_y": 0.22938884794712067}
fptq_finegrained_posttraining_quantization_for_large_language_models
https://arxiv.org/pdf/2308.15987
A Hardware Evaluation Framework for Large Language Model Inference
2,023
[["Hengrui Zhang", ""], ["August Ning", ""], ["R.V.S.N. Prabhakar", ""], ["David Wentzlaff", ""]]
arXiv (Cornell University)
The past year has witnessed the increasing popularity of Large Language Models (LLMs). Their unprecedented scale and associated high hardware cost have impeded their broader adoption, calling for efficient hardware designs. With the large hardware needed to simply run LLM inference, evaluating different hardware design...
false
4
The abstract addresses architectural design through cost-effective hardware trade-offs and hardware optimization via performance/cost modeling and mapping strategies. It directly supports the research question by focusing on hardware efficiency and accessibility for LLM deployment, though it does not cover algorithmic ...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Addressed via hardware design trade-offs", "data_processing_selection": "Not addressed", "hardware_optimization": "Core focus: cost-effective hardware co-design"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.004098943900316954, -0.002001148648560047, -0.014518548734486103, 0.016201043501496315, 0.028219370171427727, 0.03111385554075241, 0.000799558765720576, -0.021377194672822952, 0.00010990699229296297, 0.03638988360762596, -0.0327417217195034, -0.013194704428315163, -0.0183564480394125, 0.023219574242830...
{"umap_x": 3.0029358863830566, "umap_y": 4.909433364868164}
a_hardware_evaluation_framework_for_large_language_model_inference
https://arxiv.org/pdf/2312.03134
FP8-LM: Training FP8 Large Language Models
2,023
[["Houwen Peng", ""], ["Kan Wu", ""], ["Yixuan Wei", ""], ["Guoshuai Zhao", ""], ["Yuxiang Yang", ""], ["Ze Liu", ""], ["Yifan Xiong", ""], ["Ziyue Yang", ""], ["Bolin Ni", ""], ["Jingcheng Hu", ""], ["Ruihang Li", ""], ["Miaosen Zhang", ""], ["Chen Li", ""], ["Ning Jia", ""], ["Ruizhe Wang", ""], ["Zheng Zhang", ""], ...
arXiv (Cornell University)
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameter...
false
4
The paper addresses algorithmic efficiency through low-precision training (FP8), which reduces memory and improves speed. It does not propose new architectures, data processing strategies, or hardware co-design. This directly supports efficiency in training LLMs, making it relevant to the research question's focus on c...
{"algorithmic_efficiency": "Uses FP8 low-precision training to reduce computational cost", "architectural_design": "No novel architecture proposed; focuses on training framework", "data_processing_selection": "Not addressed; no data selection or preprocessing discussed", "hardware_optimization": "Implied via H100 GPU p...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.03430480882525444, 0.050982676446437836, 0.0028122789226472378, 0.012120760977268219, 0.013811984099447727, -0.0026216921396553516, -0.001583805656991899, -0.010940451174974442, -0.007847306318581104, -0.01155836135149002, -0.029502013698220253, 0.009281092323362827, -0.010801919735968113, -0.035308659...
{"umap_x": 4.3147358894348145, "umap_y": 0.39971452951431274}
fp8lm_training_fp8_large_language_models
https://arxiv.org/pdf/2310.18313
A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities
2,022
[["Fuxun Yu", ""], ["Di Wang", ""], ["Longfei Shangguan", ""], ["Minjia Zhang", ""], ["Xulong Tang", ""], ["Chenchen Liu", ""], ["Xiang Chen", ""]]
arXiv (Cornell University)
Deep Learning (DL) models have achieved superior performance in many application domains, including vision, language, medical, commercial ads, entertainment, etc. With the fast development, both DL applications and the underlying serving hardware have demonstrated strong scaling trends, i.e., Model Scaling and Compute ...
false
3
The abstract focuses on serving system optimization rather than algorithmic, architectural, data, or hardware efficiency at the model level. It touches on hardware scaling and accelerator use but does not discuss model compression, lightweight designs, data selection, or co-design for efficiency in training/deployment....
{"algorithmic_efficiency": "Not directly addressed", "architectural_design": "Not directly addressed", "data_processing_selection": "Not directly addressed", "hardware_optimization": "Implied in serving system scaling and accelerator integration"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [-0.0016721459105610847, 0.030678575858473778, 0.001626033685170114, -0.004230379592627287, 0.007559595163911581, 0.013537815771996975, 0.03150627389550209, -0.0239376500248909, 0.01371136773377657, 0.019266443327069283, 0.004370516166090965, -0.01210783515125513, 0.02688843570649624, -0.0291058998554945,...
{"umap_x": 3.3633322715759277, "umap_y": 4.087941646575928}
a_survey_of_largescale_deep_learning_serving_system_optimization_challenges_and_opportunities
https://arxiv.org/pdf/2111.14247
QuantEase: Optimization-based Quantization for Language Models
2,023
[["Kayhan Behdin", ""], ["Ayan Acharya", ""], ["Aman Gupta", ""], ["Sathiya Keerthi", ""], ["Rahul Mazumder", ""]]
arXiv (Cornell University)
With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization f...
false
4
The paper addresses algorithmic efficiency through layer-wise post-training quantization using coordinate descent, with outlier-aware refinement for high-precision retention. It contributes to hardware optimization via efficient GPU deployment of quantized models, particularly on A100, and shows strong performance gain...
{"algorithmic_efficiency": "Layer-wise PTQ with CD-based optimization", "architectural_design": "No architectural modifications; focuses on quantization", "data_processing_selection": "None addressed in the abstract", "hardware_optimization": "Efficient GPU execution on A100; co-design with hardware deployment"}
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.02397461049258709, 0.005251666996628046, -0.01302356831729412, 0.010427404195070267, 0.025579234585165977, 0.027170270681381226, 0.013419290073215961, -0.012123478576540947, 0.002885440131649375, 0.007296632509678602, -0.03953297436237335, -0.0047563412226736546, -0.013125701807439327, -0.0399304553866...
{"umap_x": 4.465002059936523, "umap_y": 0.1609945148229599}
quantease_optimizationbased_quantization_for_language_models
https://arxiv.org/pdf/2309.01885
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
2,023
[["Zhikai Li", ""], ["Xianxiang Liu", ""], ["Banghua Zhu", ""], ["Zhen Dong", ""], ["Qingyi Gu", ""], ["Kurt Keutzer", ""]]
arXiv (Cornell University)
Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks. Fine-tuning these pretrained models on downstream datasets provides further significant performance gains; however, this process typically requires a large number of expensive, high-end GPUs. Alth...
false
4
The paper enhances algorithmic efficiency through quantization of weights, gradients, and optimizer states, reducing memory usage by 79% without performance loss. It enables full-parameter fine-tuning on affordable hardware, improving accessibility for trainers and deployers, directly aligning with the research questio...
{"algorithmic_efficiency": "Quantization reduces memory and computation via INT8 format", "architectural_design": "No novel architecture proposed; focuses on training pipeline optimization", "data_processing_selection": "Not addressed in the abstract", "hardware_optimization": "Enables use of existing GPUs (e.g., A6000...
{"references": ["achieving_peak_performance_for_large_language_models_a_systematic_review"], "citations": []}
{"embedding": [0.0071003432385623455, 0.034356698393821716, -0.00010983951506204903, -0.021718133240938187, 0.03852447122335434, 0.011878672987222672, 0.00916975736618042, -0.006905450485646725, -0.01152130588889122, 0.02457798644900322, -0.02071438916027546, 0.001647834898903966, 0.010156773030757904, -0.0496388673782...
{"umap_x": 4.254105567932129, "umap_y": 0.20999328792095184}
qft_quantized_fullparameter_tuning_of_llms_with_affordable_resources
https://arxiv.org/pdf/2310.07147
Exploring Material Design Space with a Deep-Learning Guided Genetic Algorithm
2,022
[["Diederik P. Kingma", ""], ["Jimmy Ba", ""]]
VUBIR (Vrije Universiteit Brussel)
Designing complex, dynamic yet multi-functional materials and devices is challenging because the design spaces for these materials have numerous interdependent and often conflicting constraints. Taking inspiration from advances in artificial intelligence and their applications in material discovery, we propose a comput...
false
3
The paper uses a deep learning model to efficiently score and guide design iterations, enhancing algorithmic efficiency through classification. It relies on a genetic algorithm for design selection and optimization, representing data processing selection via simulation and feedback loops. However, it does not focus on ...
{"algorithmic_efficiency": "Deep learning scores simulation outputs for efficient design iteration", "architectural_design": "No specific model architecture proposed for efficiency", "data_processing_selection": "Uses simulation and selection via genetic algorithm for design filtering", "hardware_optimization": "Not ad...
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.009397474117577076, -0.003039266914129257, 0.026854636147618294, 0.018171291798353195, 0.0052450280636549, -0.022266127169132233, -0.03669273480772972, -0.007724152412265539, 0.02912774495780468, 0.008807561360299587, -0.016548797488212585, 0.05836021155118942, 0.008996633812785149, -0.0154757080599665...
{"umap_x": 2.089662551879883, "umap_y": 3.8895814418792725}
exploring_material_design_space_with_a_deeplearning_guided_genetic_algorithm
null
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
2,022
[["Elad Ben Zaken", "Laboratoire d'Informatique de Paris-Nord"], ["Yoav Goldberg", "Laboratoire d'Informatique de Paris-Nord"], ["Shauli Ravfogel", "Laboratoire d'Informatique de Paris-Nord"]]
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, the ...
false
3
BitFit improves efficiency by selectively updating biases, enhancing algorithmic efficiency without altering architecture. It offers a lightweight, parameter-efficient approach relevant to AI accessibility for trainers. While it touches on efficiency, it does not involve data selection or hardware co-design, limiting b...
{"algorithmic_efficiency": "Sparse bias updating reduces training cost", "architectural_design": "No architectural changes to transformer base", "data_processing_selection": "Not addressed in the abstract", "hardware_optimization": "Not addressed in the abstract"}
{"references": ["lora_lowrank_adaptation_of_large_language_models", "efficient_llms_training_and_inference_an_introduction"], "citations": []}
{"embedding": [0.011544919572770596, 0.017405569553375244, -0.014123649336397648, 0.01283169724047184, 0.018173139542341232, 0.002170075662434101, 0.039726804941892624, -0.007599906995892525, -0.016264453530311584, -0.011280042119324207, -0.011205479502677917, -0.015551426447927952, -0.0015013553202152252, -0.039635945...
{"umap_x": 3.449153423309326, "umap_y": 0.6191572546958923}
bitfit_simple_parameterefficient_finetuning_for_transformerbased_masked_languagemodels
https://aclanthology.org/2022.acl-short.1.pdf
Feature Purification: How Adversarial Training Performs Robust Deep Learning
2,022
[["Zeyuan Allen-Zhu", "Microsoft (United States)"], ["Yuanzhi Li", "Carnegie Mellon University"]]
Despite the empirical success of using adversarial training to defend deep learning models against adversarial perturbations, so far, it still remains rather unclear what the principles are behind the existence of adversarial perturbations, and what adversarial training does to the neural network to remove them. In thi...
false
3
The paper establishes that adversarial training purifies hidden weight mixtures, improving model robustness—addressing algorithmic efficiency through training robustness. It does not involve architectural innovations, data processing, or hardware optimization, thus limiting its direct relevance to the research question...
{"algorithmic_efficiency": "Adversarial training enhances model robustness via gradient-based perturbations", "architectural_design": "No specific architectural changes proposed; applies to general neural networks", "data_processing_selection": "No data selection, synthesis, or preprocessing strategies discussed", "har...
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.0004533011233434081, 0.027196208015084267, 0.01210768148303032, 0.03065396286547184, -0.005003484431654215, 0.010625001043081284, 0.000997791183181107, -0.013268725946545601, -0.024709295481443405, 0.016307218000292778, 0.027397217229008675, 0.05365484952926636, -0.019341163337230682, -0.04676495864987...
{"umap_x": 3.977673292160034, "umap_y": 2.4711735248565674}
feature_purification_how_adversarial_training_performs_robust_deep_learning
null
Overcoming catastrophic forgetting in neural networks
2,022
[["James Kirkpatrick", "DeepMind (United Kingdom)"], ["Razvan Pascanu", "DeepMind (United Kingdom)"], ["Neil C. Rabinowitz", "DeepMind (United Kingdom)"], ["Joel Veness", "DeepMind (United Kingdom)"], ["Guillaume Desjardins", "DeepMind (United Kingdom)"], ["Andrei A. Rusu", "DeepMind (United Kingdom)"], ["Kieran Milan"...
arXiv (Cornell University)
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this lim...
false
3
The paper addresses algorithmic efficiency via selective learning slowdown to prevent catastrophic forgetting, but does not propose new architectures, data processing methods, or hardware optimizations. While it contributes to AI efficiency in training sequential tasks, it does not directly address the research questio...
{"algorithmic_efficiency": "Selective weight learning slowdown", "architectural_design": "No novel architecture proposed", "data_processing_selection": "No data selection or preprocessing strategy", "hardware_optimization": "No hardware or accelerator co-design"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.0070702433586120605, 0.02575031854212284, -0.01903538778424263, 0.021725112572312355, 0.023151583969593048, 0.004996602889150381, -0.0011994371889159083, -0.03675510361790657, -0.016294876113533974, 0.009281368926167488, 0.007793149445205927, 0.011716107837855816, 0.02775885909795761, -0.01032738480716...
{"umap_x": 4.181629657745361, "umap_y": 2.284139394760132}
overcoming_catastrophic_forgetting_in_neural_networks
null
Decoupled Weight Decay Regularization
2,022
[["Ilya Loshchilov", ""], ["Frank Hutter", ""]]
arXiv (Cornell University)
L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularizatio...
false
3
The abstract addresses algorithmic efficiency by modifying weight decay in adaptive optimizers like Adam, improving generalization and decoupling decay from learning rate settings. While it does not involve data processing, architectural design, or hardware optimization, it directly contributes to making training more ...
{"algorithmic_efficiency": "Improves optimization efficiency in adaptive algorithms", "architectural_design": "Not applicable", "data_processing_selection": "Not applicable", "hardware_optimization": "Not applicable"}
{"references": ["lora_lowrank_adaptation_of_large_language_models", "alphatuning_quantizationaware_parameterefficient_adaptation_of_largescale_pretrained_language_models"], "citations": []}
{"embedding": [-0.01184110064059496, 0.01621609926223755, -0.005301935598254204, -0.013153186067938805, 0.01758531481027603, 0.036833204329013824, 0.004620287101715803, -0.004907745867967606, -0.008749748580157757, 0.024373788386583328, -0.01620720699429512, 0.007813624106347561, 0.0005043781129643321, -0.0337787307798...
{"umap_x": 4.110099792480469, "umap_y": 2.4241833686828613}
decoupled_weight_decay_regularization
https://arxiv.org/pdf/1711.05101
Language Models are Few-Shot Learners
2,022
[["T. B. Brown", ""], ["Benjamin Mann", ""], ["Nick Ryder", ""], ["Melanie Subbiah", ""], ["Jared Kaplan", ""], ["Prafulla Dhariwal", ""], ["Arvind Neelakantan", ""], ["Pranav Shyam", ""], ["Girish Sastry", ""], ["Amanda Askell", ""], ["Sandhini Agarwal", ""], ["Ariel Herbert-Voss", ""], ["Gretchen Krueger", ""], ["Tom...
arXiv (Cornell University)
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples...
false
4
The paper highlights algorithmic efficiency through zero-shot/few-shot learning, leverages architectural scale for broad applicability, reduces data processing demands via minimal examples, and does not address hardware-specific optimization. It directly relates to making AI more accessible by enabling task adaptation ...
{"algorithmic_efficiency": "Few-shot learning via text interaction without training", "architectural_design": "Scalable autoregressive architecture enables task-agnostic performance", "data_processing_selection": "Few-shot demonstrations reduce data needs significantly", "hardware_optimization": "None mentioned"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.02560979686677456, 0.01858886517584324, 0.009291233494877815, -0.01141581404954195, 0.002741728676483035, 0.018015457317233086, 0.004652155097573996, -0.042192064225673676, -0.0030511191580444574, 0.006189063191413879, -0.022126812487840652, 0.006093230564147234, 0.008582998998463154, -0.01351233571767...
{"umap_x": 1.5133603811264038, "umap_y": 2.127261161804199}
language_models_are_fewshot_learners
https://arxiv.org/pdf/2005.14165
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
2,022
[["Victor W. Zhong", ""], ["Caiming Xiong", ""], ["Richard Socher", ""]]
arXiv (Cornell University)
A significant amount of the world's knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL, a deep neural network for translating natural language questions to correspondin...
false
3
The paper addresses algorithmic efficiency through reinforcement learning to improve query generation accuracy. It uses a specific architectural design that leverages SQL structure to reduce output space. Data processing relies on a large curated dataset for training, though no data selection or synthesis strategies ar...
{"algorithmic_efficiency": "Reinforcement learning for query generation efficiency", "architectural_design": "Sequence-to-sequence model with structured query output", "data_processing_selection": "Uses large hand-annotated dataset for training", "hardware_optimization": "None mentioned"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [-0.008464434184134007, 0.005215497221797705, -0.007602031342685223, 0.02236621454358101, 0.024939091876149178, 0.027508040890097618, 0.007205411791801453, -0.002057467820122838, -0.0011065683793276548, 0.012695111334323883, -0.03831882402300835, 0.007555806543678045, 0.02673533372581005, 0.02752967923879...
{"umap_x": 1.1234087944030762, "umap_y": 2.9277100563049316}
seq2sql_generating_structured_queries_from_natural_language_using_reinforcement_learning
https://arxiv.org/pdf/1709.00103
Predicting Parameters in Deep Learning
2,022
[["Misha Denil", "University of Oxford"], ["Babak Shakibi", "University of British Columbia"], ["Laurent Dinh", "Universit\u00e9 de Montr\u00e9al"], ["Marc\u2019Aurelio Ranzato", "Meta (Israel)"], ["Nando de Freitas", "University of British Columbia"]]
arXiv (Cornell University)
We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be lear...
false
4
The paper addresses algorithmic efficiency by reducing the number of parameters learned through prediction, allowing significant weight reduction without accuracy loss. This directly improves training and inference efficiency, aligning with the research question's focus on efficiency in AI systems, though no architectu...
{"algorithmic_efficiency": "Weight prediction reduces parameter learning", "architectural_design": "Models trained with predicted parameters retain performance", "data_processing_selection": "None", "hardware_optimization": "None"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.002469968516379595, 0.014963985420763493, -0.026777327060699463, 0.01865798607468605, -0.0008886177092790604, -0.011313341557979584, -0.008303527720272541, -0.0303428303450346, -0.010806499049067497, 0.032543834298849106, -0.0014265034114941955, 0.0048087951727211475, -0.022253086790442467, -0.03971042...
{"umap_x": 4.069754123687744, "umap_y": 2.7750396728515625}
predicting_parameters_in_deep_learning
https://arxiv.org/pdf/1306.0543
Speeding up Convolutional Neural Networks with Low Rank Expansions
2,022
[["Max Jaderberg", "University of Oxford"], ["Andrea Vedaldi", "University of Oxford"], ["Andrew Zisserman", "University of Oxford"]]
arXiv (Cornell University)
The focus of this paper is speeding up the evaluation of convolutional neural networks. While delivering impressive results across a range of computer vision and machine learning tasks, these networks are computationally demanding, limiting their deployability. Convolutional layers generally consume the bulk of the pro...
false
4
The paper addresses algorithmic efficiency through low-rank filter expansion, reducing computational cost in convolutional layers without accuracy loss. It offers a general, hardware-agnostic method that improves inference speed and is applicable across architectures, directly supporting AI efficiency for deployers. Th...
{"algorithmic_efficiency": "Low-rank filter compression reduces computation", "architectural_design": "Architecture-agnostic speedup applicable to existing models", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.008638177998363972, -0.004524416755884886, 0.01668434776365757, -0.025224387645721436, 0.019220389425754547, 0.05526536703109741, 0.008548814803361893, -0.008667643181979656, 0.019135527312755585, -0.004393539857119322, -0.04014981538057327, 0.0037337904796004295, -0.03166889026761055, -0.0157743096351...
{"umap_x": 3.6240036487579346, "umap_y": 2.2749950885772705}
speeding_up_convolutional_neural_networks_with_low_rank_expansions
https://arxiv.org/pdf/1405.3866
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
2,022
[["Dmitry Lepikhin", ""], ["HyoukJoong Lee", ""], ["Yuanzhong Xu", ""], ["Dehao Chen", ""], ["Orhan F\u0131rat", ""], ["Yanping Huang", ""], ["Maxim Krikun", ""], ["Noam Shazeer", ""], ["Zhifeng Chen", ""]]
arXiv (Cornell University)
Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute. Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computati...
false
4
The paper addresses algorithmic efficiency through conditional computation and sparsity in a mixture-of-experts design, supports architectural scaling for large models, and is hardware-optimized for TPUs via XLA. It directly relates to efficient AI training for large-scale deployment and accessible training via automat...
{"algorithmic_efficiency": "Uses conditional computation and sparse gating", "architectural_design": "Enables scaling via mixture-of-experts architecture", "data_processing_selection": "NONE", "hardware_optimization": "Optimized for TPUv3 accelerators via XLA integration"}
{"references": ["lora_lowrank_adaptation_of_large_language_models", "sequence_parallelism_long_sequence_training_from_system_perspective", "deepspeed_inference_enabling_efficient_inference_of_transformer_models_at_unprecedented_scale"], "citations": []}
{"embedding": [-0.012100476771593094, 0.0002512754872441292, 0.00165158836171031, -0.005959854926913977, 0.027359772473573685, -0.006937849801033735, 0.014330771751701832, -0.017052948474884033, -0.0028369794599711895, 0.0014462778344750404, -0.010524846613407135, -0.0024162577465176582, 0.006662970408797264, -0.003311...
{"umap_x": 3.188305616378784, "umap_y": 4.136420726776123}
gshard_scaling_giant_models_with_conditional_computation_and_automatic_sharding
https://arxiv.org/pdf/2006.16668
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
2,022
[["Yuanzhi Li", "Stanford University"], ["Yingyu Liang", "University of Wisconsin\u2013Madison"]]
arXiv (Cornell University)
Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In ...
false
3
The paper focuses on algorithmic learning dynamics in overparameterized networks using SGD, with theoretical and empirical analysis on synthetic and MNIST data. While it contributes to understanding training efficiency in neural networks, it does not address architectural optimization, data selection, or hardware co-de...
{"algorithmic_efficiency": "SGD optimization in overparameterized settings", "architectural_design": "Two-layer ReLU network studied in overparameterized regime", "data_processing_selection": "No data selection or preprocessing discussed", "hardware_optimization": "No hardware co-design or optimization mentioned"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.03902643546462059, 0.024980643764138222, -0.014193999581038952, 0.006801670882850885, 0.0100477933883667, -0.010264947079122066, 0.015112659893929958, -0.0015336116775870323, -0.018107200041413307, 0.03797346353530884, 0.0007702888688072562, 0.0342683270573616, 0.023684769868850708, -0.0320594012737274...
{"umap_x": 4.2000274658203125, "umap_y": 2.5806143283843994}
learning_overparameterized_neural_networks_via_stochastic_gradient_descent_on_structured_data
https://arxiv.org/pdf/1808.01204
Prefix-Tuning: Optimizing Continuous Prompts for Generation
2,022
[["Xiang Lisa Li", "Stanford University"], ["Percy Liang", "Stanford University"]]
arXiv (Cornell University)
Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural lan...
false
3
Prefix-tuning improves algorithmic efficiency by optimizing a small continuous vector instead of full model parameters, reducing computational cost and storage needs. This contributes to making AI more accessible via minimal parameter updates, aligning with efficiency goals for trainers and deployers, though without ad...
{"algorithmic_efficiency": "Leverages small vector optimization for parameter reduction", "architectural_design": "No architectural change; retains original model structure", "data_processing_selection": "Not discussed; no data selection or preprocessing methods", "hardware_optimization": "Not addressed; no hardware co...
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [-0.0026586405001580715, 0.0015142711345106363, -0.02118261344730854, -0.004876960534602404, 0.015239257365465164, -0.008548552170395851, 0.01241075899451971, -0.00452186306938529, 0.019566277042031288, -0.011113758198916912, -0.05659214407205582, -0.010881579481065273, 0.0008818699861876667, -0.020460594...
{"umap_x": 2.351769208908081, "umap_y": 1.2215747833251953}
prefixtuning_optimizing_continuous_prompts_for_generation
https://arxiv.org/pdf/2101.00190
Know What You Don't Know: Unanswerable Questions for SQuAD
2,022
[["Pranav Rajpurkar", ""], ["Robin Jia", ""], ["Percy Liang", ""]]
arXiv (Cornell University)
Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context. Existing datasets either focus exclusively on answerable questions, or use automatically...
false
3
The paper addresses data processing selection by introducing adversarially crafted unanswerable questions to improve model robustness in detecting when answers are absent. While it does not focus on algorithms, architecture, or hardware, it contributes to AI efficiency by forcing models to handle uncertainty and avoid ...
{"algorithmic_efficiency": "Not applicable", "architectural_design": "Not applicable", "data_processing_selection": "Adversarial unanswerable question generation", "hardware_optimization": "Not applicable"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [-0.013091768138110638, -0.03176676481962204, 0.032827626913785934, 0.027005482465028763, 0.031125470995903015, 0.006606020033359528, -0.013743044808506966, -0.004503743257373571, 0.011306463740766048, 0.015532824210822582, -0.04322914779186249, -0.01072540134191513, 0.02385491505265236, -0.00309018604457...
{"umap_x": 0.2869495153427124, "umap_y": 2.0496487617492676}
know_what_you_dont_know_unanswerable_questions_for_squad
https://arxiv.org/pdf/1806.03822
Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines
2,022
[["Yen-Chang Hsu", "Georgia Institute of Technology"], ["Yen\u2010Cheng Liu", "Georgia Institute of Technology"], ["Zsolt Kira", "Georgia Institute of Technology"]]
arXiv (Cornell University)
Continual learning has received a great deal of attention recently with several approaches being proposed. However, evaluations involve a diverse set of scenarios making meaningful comparison difficult. This work provides a systematic categorization of the scenarios and evaluates them within a consistent framework incl...
false
1
The abstract addresses evaluation frameworks and baseline performance in continual learning but does not engage with algorithmic efficiency, architectural design, data processing, or hardware optimization. It does not offer concrete strategies for improving efficiency or accessibility for trainers and deployers in prac...
{"algorithmic_efficiency": "Minimal focus on model compression or low-precision training", "architectural_design": "No novel architecture proposed or optimized for speed/memory", "data_processing_selection": "No discussion of data selection, synthesis, or efficient labeling", "hardware_optimization": "No co-design with...
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [-0.014625205658376217, -0.001903163967654109, -0.018453290686011314, -0.000738292234018445, 0.027609808370471, 0.039946556091308594, 0.012857708148658276, -0.0029234064277261496, -0.02414843626320362, 0.025589922443032265, -0.030185570940375328, -0.02350064367055893, 0.0012261575320735574, -0.00863887462...
{"umap_x": 4.172281265258789, "umap_y": 2.1912193298339844}
reevaluating_continual_learning_scenarios_a_categorization_and_case_for_strong_baselines
https://arxiv.org/pdf/1810.12488
Parameter-Efficient Transfer Learning for NLP
2,022
[["Neil Houlsby", "Google (United States)"], ["Andrei Giurgiu", ""], ["Stanis\u0142aw Jastrz\u0229bski", "Universit\u00e9 de Montr\u00e9al"], ["Bruna Morrone", ""], ["Quentin de Laroussilhe", "Google (United States)"], ["Andr\u00e9a Gesmundo", "Google (United States)"], ["Mona Attariyan", ""], ["Sylvain Gelly", "Google...
arXiv (Cornell University)
Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extens...
false
4
The paper addresses algorithmic efficiency through adapter modules that significantly reduce trainable parameters per task, leveraging parameter sharing without altering base model architecture. This directly enhances efficiency in model adaptation across tasks, making it highly relevant to AI efficiency and accessibil...
{"algorithmic_efficiency": "Adapter modules enable parameter-efficient fine-tuning", "architectural_design": "Adapter modules integrate as lightweight extensions to base model", "data_processing_selection": "Not explicitly addressed in abstract", "hardware_optimization": "Not addressed in abstract"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.027658388018608093, 0.009653572924435139, 0.0006121136248111725, -0.003002958372235298, 0.013951271772384644, -0.015135643072426319, 0.008724924176931381, -0.03877105191349983, -0.004272460006177425, 0.008342714048922062, -0.037990178912878036, 0.007770479191094637, -0.01641257479786873, -0.02951675467...
{"umap_x": 2.8846914768218994, "umap_y": 1.8588330745697021}
parameterefficient_transfer_learning_for_nlp
https://arxiv.org/pdf/1902.00751
Learning multiple visual domains with residual adapters
2,022
[["Sylvestre-Alvise Rebuffi", "University of Oxford"], ["Hakan Bilen", "University of Edinburgh"], ["Andrea Vedaldi", "University of Oxford"]]
arXiv (Cornell University)
There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to s...
false
3
The paper introduces an efficient architectural design using residual adapter modules that enable efficient switching between visual domains with shared parameters, improving representational efficiency. While it contributes to model flexibility and efficient domain adaptation, it does not address data processing, hard...
{"algorithmic_efficiency": "Adapter modules enable efficient domain adaptation", "architectural_design": "Residual adapter architecture supports domain flexibility", "data_processing_selection": "No explicit data selection or synthesis discussed", "hardware_optimization": "Not addressed"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.0017677545547485352, 0.0018994346028193831, -0.004275087732821703, 0.013337694108486176, 0.02514459379017353, -0.001514033298008144, -0.006724846083670855, -0.0232501570135355, 0.032904740422964096, 0.00031386601040139794, -0.021580127999186516, 0.014449318870902061, -0.034032877534627914, -0.033972043...
{"umap_x": 3.405160427093506, "umap_y": 2.7284488677978516}
learning_multiple_visual_domains_with_residual_adapters
https://arxiv.org/pdf/1705.08045
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
2,022
[["Rabeeh Karimi Mahabadi", "Idiap Research Institute"], ["James Henderson", "Idiap Research Institute"], ["Sebastian Ruder", "Google (United States)"]]
arXiv (Cornell University)
Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wastef...
false
4
Compacter improves algorithmic efficiency through low-rank operations and minimal trainable parameters, enabling efficient adaptation without full model fine-tuning. It directly addresses the research question by enhancing AI efficiency in training via parameter-efficient methods, particularly useful for trainers with ...
{"algorithmic_efficiency": "Low-rank adaptation with Kronecker products", "architectural_design": "Adapter layers with hypercomplex multiplication", "data_processing_selection": "NONE", "hardware_optimization": "NONE"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": ["alphatuning_quantizationaware_parameterefficient_adaptation_of_largescale_pretrained_language_models"]}
{"embedding": [0.023729851469397545, 0.007310544606298208, 0.008352437987923622, -0.0012474829563871026, 0.019677115604281425, 0.042796917259693146, 0.01088767684996128, -0.0019664259161800146, -0.009944587014615536, -0.012589478865265846, -0.05409243702888489, 0.014087783172726631, -0.026306690648198128, -0.0385413728...
{"umap_x": 3.1869301795959473, "umap_y": 1.812894582748413}
compacter_efficient_lowrank_hypercomplex_adapter_layers
https://arxiv.org/pdf/2106.04647
Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian
2,022
[["Samet Oymak", ""], ["Zalan Fabian", ""], ["Mingchen Li", ""], ["Mahdi Soltanolkotabi", ""]]
arXiv (Cornell University)
Modern neural network architectures often generalize well despite containing many more parameters than the size of the training dataset. This paper explores the generalization capabilities of neural networks trained via gradient descent. We develop a data-dependent optimization and generalization theory which leverages...
false
3
The paper analyzes generalization through Jacobian low-rank structure, showing that learning efficiency is driven by information space alignment, which relates to algorithmic efficiency and generalization behavior. While not directly addressing data processing, hardware, or architectural design specifics, it provides f...
{"algorithmic_efficiency": "Jacobian low-rank structure enables faster convergence", "architectural_design": "No specific architecture proposed; generalizes across architectures", "data_processing_selection": "No data selection or preprocessing strategies discussed", "hardware_optimization": "Not addressed in the paper...
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.02000451274216175, 0.009123745374381542, -0.007898847572505474, 0.012675692327320576, -0.02266932651400566, 0.03766566514968872, 0.009186984971165657, -0.008956479839980602, -0.007460697088390589, 0.027324751019477844, -0.0011622163001447916, 0.013505066744983196, 0.00711438711732626, -0.02269021607935...
{"umap_x": 4.289889812469482, "umap_y": 2.5077474117279053}
generalization_guarantees_for_neural_networks_via_harnessing_the_lowrank_structure_of_the_jacobian
https://arxiv.org/pdf/1906.05392
A Convergence Theory for Deep Learning via Over-Parameterization
2,022
[["Zeyuan Allen-Zhu", "Massachusetts Institute of Technology"], ["Yuanzhi Li", "Stanford University"], ["Zhao Song", "The University of Texas at Austin"]]
arXiv (Cornell University)
Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, networks used in practice are going wider and deeper. On the theoretical side, a long line of works has been focusing on training neural networks with one hidden layer. The theory of multi-layer networks remains largely ...
false
3
The paper establishes theoretical guarantees for SGD convergence in over-parameterized networks, supporting algorithmic efficiency through polynomial-time training. While it touches on common architectures like ResNet and CNNs, it does not address data selection, hardware co-design, or model compression. Its relevance ...
{"algorithmic_efficiency": "Theoretical foundation for training efficiency in over-parameterized networks", "architectural_design": "Applies to CNNs, ResNets, and fully-connected networks with wide layers", "data_processing_selection": "None", "hardware_optimization": "None"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [0.020319290459156036, 0.028834838420152664, -0.006699996069073677, -0.016561489552259445, 0.011589408852159977, 0.007056766655296087, 0.010661774314939976, -0.02294219098985195, -0.024456987157464027, 0.047251734882593155, -0.003742802422493696, 0.011202190071344376, 0.004841162357479334, -0.027795201167...
{"umap_x": 4.285877704620361, "umap_y": 2.6235153675079346}
a_convergence_theory_for_deep_learning_via_overparameterization
https://arxiv.org/pdf/1811.03962
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
2,022
[["Pengcheng He", ""], ["Xiaodong Liu", ""], ["Jianfeng Gao", ""], ["Weizhu Chen", ""]]
arXiv (Cornell University)
Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techn...
false
3
The paper presents a novel architectural design with disentangled attention and a mask decoder, enhancing training efficiency and performance. While it improves algorithmic efficiency and architectural design, it does not address data processing selection or hardware optimization, making it relevant to efficiency and a...
{"algorithmic_efficiency": "Disentangled attention improves training efficiency", "architectural_design": "DeBERTa introduces disentangled attention and mask decoder", "data_processing_selection": "No explicit data selection or preprocessing discussed", "hardware_optimization": "None mentioned"}
{"references": ["lora_lowrank_adaptation_of_large_language_models"], "citations": []}
{"embedding": [-0.024526620283722878, 0.01903633400797844, 0.0035062257666140795, 0.04038066044449806, 0.033415038138628006, 0.0016649658791720867, -0.0386202447116375, -0.03424025699496269, 0.0043477644212543964, 0.009633335284888744, -0.010633913800120354, 0.00842992402613163, -0.013247573748230934, -0.00919651053845...
{"umap_x": 0.6972314715385437, "umap_y": 3.778632164001465}
deberta_decodingenhanced_bert_with_disentangled_attention
https://arxiv.org/pdf/2006.03654
A survey of GPT-3 family large language models including ChatGPT and GPT-4
2,023
[["Katikapalli Subramanyam Kalyan", ""]]
Natural Language Processing Journal
Large language models (LLMs) are a special class of pretrained language models (PLMs) obtained by scaling model size, pretraining corpus and computation. LLMs, because of their large size and pretraining on large volumes of text data, exhibit special abilities which allow them to achieve remarkable performances without...
false
2
The paper surveys GPT-3 family models' performance and data usage but does not discuss algorithmic compression, architectural optimization for speed, data reduction strategies, or hardware co-design. It lacks specific insights into efficiency or accessibility for practitioners in training or deployment.
{"algorithmic_efficiency": "Focuses on scaling via size and data, not efficiency techniques", "architectural_design": "Reviews transformer architecture, no novelty in design optimization", "data_processing_selection": "Mentions data labeling and augmentation, not data selection or reduction", "hardware_optimization": "...
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.012974121607840061, 0.03144601359963417, -0.00870821438729763, -0.0005060387193225324, 0.021184569224715233, 0.005799797363579273, 0.002272999379783869, -0.009255812503397465, 0.00695395702496171, 0.015045138075947762, -0.012457071803510189, -0.005759855732321739, 0.024977555498480797, -0.0268089193850...
{"umap_x": 0.0537099651992321, "umap_y": 2.7919604778289795}
a_survey_of_gpt3_family_large_language_models_including_chatgpt_and_gpt4
null
A Survey on Text Classification Algorithms: From Text to Predictions
2,022
[["Andrea Gasparetto", "Ca' Foscari University of Venice"], ["Matteo Marcuzzo", "Ca' Foscari University of Venice"], ["Alessandro Zangari", "Ca' Foscari University of Venice"], ["Andrea Albarelli", "Ca' Foscari University of Venice"]]
Information
In recent years, the exponential growth of digital documents has been met by rapid progress in text classification techniques. Newly proposed machine learning algorithms leverage the latest advancements in deep learning methods, allowing for the automatic extraction of expressive features. The swift development of thes...
false
3
The paper reviews deep learning-based text classification algorithms and architectures, highlighting data-to-label pipelines and model feature extraction. While it touches on algorithmic and architectural aspects relevant to AI efficiency, it lacks explicit discussion of compression, pruning, hardware co-design, or dat...
{"algorithmic_efficiency": "Focuses on deep learning models for feature extraction", "architectural_design": "Reviews deep learning architectures for text transformation", "data_processing_selection": "Addresses dataset synthesis and availability for multilabel tasks", "hardware_optimization": "None"}
{"references": ["a_survey_of_text_classification_with_transformers_how_wide_how_large_how_long_how_accurate_how_expensive_how_safe"], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.017592504620552063, 0.00503957225009799, -0.005149825941771269, 0.004226818680763245, 0.003205314977094531, -0.019972490146756172, 0.010171033442020416, -0.0478762611746788, 0.01969846524298191, 0.029557714238762856, -0.006240266840904951, -0.017160402610898018, 0.022067632526159286, -0.02934293262660...
{"umap_x": 0.07603665441274643, "umap_y": 4.33414363861084}
a_survey_on_text_classification_algorithms_from_text_to_predictions
https://www.mdpi.com/2078-2489/13/2/83/pdf?version=1644993135
polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
2,023
[["Christopher Kuenneth", "Georgia Institute of Technology"], ["Rampi Ramprasad", "Georgia Institute of Technology"]]
Nature Communications
Abstract Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this ...
false
3
The paper introduces polyBERT, an NLP-inspired model that efficiently encodes polymer structures and enables fast property prediction, achieving high speed and accuracy. While it demonstrates algorithmic and architectural innovation with efficiency gains, it does not detail data selection strategies or hardware-specifi...
{"algorithmic_efficiency": "Uses lightweight model with efficient fingerprinting", "architectural_design": "Leverages NLP-inspired architecture for chemical data", "data_processing_selection": "No explicit data selection or preprocessing strategy mentioned", "hardware_optimization": "Mentions scalable cloud deployment ...
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.024278175085783005, 0.010873452760279179, -0.03432196006178856, 0.026419352740049362, 0.018336186185479164, 0.0009473113459534943, -0.017893169075250626, 0.000896571553312242, -0.018512625247240067, -0.0194235872477293, -0.01179574802517891, -0.008076991885900497, 0.0037403430324047804, -0.031805999577...
{"umap_x": 1.5547595024108887, "umap_y": 3.8128325939178467}
polybert_a_chemical_language_model_to_enable_fully_machinedriven_ultrafast_polymer_informatics
https://www.nature.com/articles/s41467-023-39868-6.pdf
ChatGPT in Healthcare: A Taxonomy and Systematic Review
2,023
[["Jianning Li", ""], ["Amin Dada", ""], ["Jens Kleesiek", ""], ["Jan Egger", ""]]
bioRxiv (Cold Spring Harbor Laboratory)
Abstract The recent release of ChatGPT, a chat bot research project / product of natural language processing (NLP) by OpenAI, stirs up a sensation among both the general public and medical professionals, amassing a phenomenally large user base in a short time. This is a typical example of the ‘productization’ of cuttin...
false
2
The paper discusses data selection through PubMed search and taxonomy but does not address algorithmic efficiency, architectural design, or hardware optimization. It evaluates ChatGPT’s performance in healthcare and concludes that general-purpose models lack reliability for clinical use, thus advocating for domain-spec...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Data filtering via taxonomy and PubMed search", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.019800761714577675, 0.0052788760513067245, 0.003064245916903019, -0.003378210123628378, 0.009886492043733597, 0.006786635145545006, -0.02393675595521927, -0.02772388979792595, 0.03150951489806175, -0.018624858930706978, -0.04164622724056244, -0.0067632608115673065, 0.028406471014022827, -0.01116777397...
{"umap_x": -1.111103892326355, "umap_y": 3.0901684761047363}
chatgpt_in_healthcare_a_taxonomy_and_systematic_review
https://www.medrxiv.org/content/medrxiv/early/2023/03/30/2023.03.30.23287899.full.pdf
Large language models (LLMs): survey, technical frameworks, and future challenges
2,024
[["Pranjal Kumar", "Lovely Professional University"]]
Artificial Intelligence Review
Artificial intelligence (AI) has significantly impacted various fields. Large language models (LLMs) like GPT-4, BARD, PaLM, Megatron-Turing NLG, Jurassic-1 Jumbo etc., have contributed to our understanding and application of AI in these domains, along with natural language processing (NLP) techniques. This work provid...
false
3
The paper covers foundational LLM architectures and applications but does not address algorithmic efficiency, data processing strategies, or hardware co-optimization. While it provides context on model design and use cases, it lacks technical details on efficiency improvements or hardware-aware adaptations crucial to t...
{"algorithmic_efficiency": "Limited discussion on compression or quantization", "architectural_design": "Discusses key architectures like transformers and variants", "data_processing_selection": "No mention of data selection, synthesis, or preprocessing", "hardware_optimization": "No focus on hardware co-design or devi...
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.014430579729378223, 0.02031988650560379, -0.02171722613275051, 0.02510249987244606, 0.01407384779304266, 0.01496240682899952, -0.015633996576070786, -0.03361469507217407, 0.0021016555838286877, 0.007373699452728033, 0.004464720841497183, -0.007907294668257236, 0.017938047647476196, -0.01634729653596878...
{"umap_x": -7.425800868077204e-05, "umap_y": 2.7811806201934814}
large_language_models_llms_survey_technical_frameworks_and_future_challenges
null
Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling
2,023
[["Alexandre Filiot", ""], ["Ridouane Ghermi", ""], ["Antoine Olivier", ""], ["Paul Jacob", ""], ["Lucas Fidon", ""], ["Alice Mac Kain", ""], ["Charlie Saillard", ""], ["Jean-Baptiste Schiratti", ""]]
bioRxiv (Cold Spring Harbor Laboratory)
Computational pathology is revolutionizing the field of pathology by integrating advanced computer vision and machine learning technologies into diagnostic workflows. It offers unprecedented opportunities for improved efficiency in treatment decisions by allowing pathologists to achieve higher precision and objectivity...
false
3
The paper addresses algorithmic efficiency through masked image modeling, architectural design via ViT-based transformers, and data processing by using large-scale unlabelled histology data. While it advances AI efficiency in medical imaging, it does not explicitly consider hardware co-design or lightweight model deplo...
{"algorithmic_efficiency": "Uses masked image modeling for efficient self-supervised learning", "architectural_design": "Employs vision transformer architecture optimized for histopathology", "data_processing_selection": "Leverages large unlabelled WSI datasets via self-supervised pre-training", "hardware_optimization"...
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.027072347700595856, 0.01363370567560196, -0.022217145189642906, 0.004354829899966717, 0.028044795617461205, -0.00374165759421885, 0.01989474706351757, -0.028971735388040543, 0.018507257103919983, 0.0049958559684455395, -0.02076709270477295, 0.04449456185102463, 0.020413938909769058, -0.009637680836021...
{"umap_x": 2.7693443298339844, "umap_y": 3.1412429809570312}
scaling_selfsupervised_learning_for_histopathology_with_masked_image_modeling
null
What Artificial Neural Networks Can Tell Us about Human Language Acquisition
2,022
[["Alex Warstadt", ""], ["Samuel R. Bowman", ""]]
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language.However, the learning environments and biases of current artificial learners and humans diverge in ways that weaken the impact of the evidence obtained from learning simulations.For ...
false
2
The abstract does not discuss algorithmic efficiency, architectural design, or hardware optimization. It addresses data processing by advocating for reduced data advantage in model training to match human input levels, emphasizing fairness and limited linguistic exposure. While indirectly related to data processing thr...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Addresses data burden via model fairness and reduced linguistic input", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.04354175180196762, 0.003036086680367589, -0.004366141743957996, 0.005305599886924028, -0.024515444412827492, -0.005617661401629448, -0.0012807794846594334, -0.04672032222151756, -0.007714225910604, 0.019473593682050705, -0.005074197892099619, -0.021935487166047096, -0.001774424221366644, -0.00799196958...
{"umap_x": -0.15746378898620605, "umap_y": 1.6555384397506714}
what_artificial_neural_networks_can_tell_us_about_human_language_acquisition
https://api.taylorfrancis.com/content/chapters/edit/download?identifierName=doi&identifierValue=10.1201/9781003205388-2&type=chapterpdf
When brain-inspired AI meets AGI
2,023
[["Lin Zhao", "University of Georgia"], ["Lu Zhang", "The University of Texas at Arlington"], ["Zihao Wu", "University of Georgia"], ["Yuzhong Chen", "University of Electronic Science and Technology of China"], ["Haixing Dai", "University of Georgia"], ["Xiaowei Yu", "The University of Texas at Arlington"], ["Zhenglian...
Meta-Radiology
Artificial General Intelligence (AGI) has been a long-standing goal of humanity, with the aim of creating machines capable of performing any intellectual task that humans can do. To achieve this, AGI researchers draw inspiration from the human brain and seek to replicate its principles in intelligent machines. Brain-in...
false
3
The abstract touches on algorithmic efficiency through in-context learning and prompt tuning, which reduce computational demands. However, it does not detail architectural designs, data selection strategies, or hardware co-design. Overall, it provides a broad overview of brain-inspired AI with limited specifics on effi...
{"algorithmic_efficiency": "Discusses in-context learning and prompt tuning for efficiency", "architectural_design": "Mentions evolving architectures without specific design innovations", "data_processing_selection": "None", "hardware_optimization": "None"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models", "easy_and_efficient_transformer_scalable_inference_solution_for_large_nlp_model"]}
{"embedding": [0.03341023623943329, 0.03229876980185509, 0.0010979629587382078, 0.005597755312919617, -0.013354325667023659, 0.0012321614194661379, 0.011319451965391636, 6.426158506656066e-05, -0.016052985563874245, 0.010264997370541096, -0.03248685970902443, -0.02260810136795044, 0.01576267182826996, -0.00619671214371...
{"umap_x": -0.5531836748123169, "umap_y": 1.8255252838134766}
when_braininspired_ai_meets_agi
null
Creating and Comparing Dictionary, Word Embedding, and Transformer-Based Models to Measure Discrete Emotions in German Political Text
2,022
[["Tobias Widmann", "Aarhus University"], ["Maximilian Wich", "Technical University of Munich"]]
Political Analysis
Abstract Previous research on emotional language relied heavily on off-the-shelf sentiment dictionaries that focus on negative and positive tone. These dictionaries are often tailored to nonpolitical domains and use bag-of-words approaches which come with a series of disadvantages. This paper creates, validates, and co...
false
3
The paper compares algorithmic approaches for emotional measurement in political text, highlighting efficiency gains from transformer models over rule-based or embedding-based methods. While it addresses algorithmic efficiency and architectural design through model choice, it does not focus on data processing, hardware...
{"algorithmic_efficiency": "Transformer models outperform dictionary and embedding baselines", "architectural_design": "Transformer models used with pretrained language models", "data_processing_selection": "No explicit data selection or preprocessing strategies mentioned", "hardware_optimization": "NONE"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.029637373983860016, 0.015831323340535164, 0.012979862280189991, 0.004948255605995655, 0.02771153301000595, 0.009059200063347816, 0.029568415135145187, -0.051434796303510666, 0.02656562626361847, 0.008108302019536495, -0.02797241508960724, 0.04264182597398758, -0.02637375518679619, -0.01932110264897346...
{"umap_x": -0.4938794672489166, "umap_y": 4.629763603210449}
creating_and_comparing_dictionary_word_embedding_and_transformerbased_models_to_measure_discrete_emotions_in_german_political_text
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/2DA41C0F09DE1CA600B3DCC647302637/S1047198722000158a.pdf/div-class-title-creating-and-comparing-dictionary-word-embedding-and-transformer-based-models-to-measure-discrete-emotions-in-german-political-text-div.pdf
Deep Learning in Sentiment Analysis: Recent Architectures
2,022
[["Tariq Abdullah", "University of Derby"], ["Ahmed Ahmet", "University of Derby"]]
ACM Computing Surveys
Humans are increasingly integrated with devices that enable the collection of vast unstructured opinionated data. Accurately analysing subjective information from this data is the task of sentiment analysis (an actively researched area in NLP). Deep learning provides a diverse selection of architectures to model sentim...
false
3
The abstract highlights architectural shifts toward Transformers in sentiment analysis, covering design and implementation trends. While it touches on architectural design, it does not discuss algorithmic efficiency, data processing, or hardware optimization, limiting its direct relevance to the research question on ef...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Focus on Transformer models and their design evolution", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.02104872092604637, 0.0017603926826268435, 0.005801396444439888, 0.014363342896103859, 0.02071497030556202, 0.006188490428030491, 0.0096374386921525, -0.05768631771206856, 0.02824646607041359, 0.021150413900613785, 0.011505277827382088, -0.026743877679109573, -0.006251558195799589, -0.0163529384881258,...
{"umap_x": -0.306428998708725, "umap_y": 4.603011131286621}
deep_learning_in_sentiment_analysis_recent_architectures
null
Rethinking Positional Encoding in Language Pre-training
2,022
[["Guolin Ke", "Microsoft Research (United Kingdom)"], ["Di He", "Microsoft Research (United Kingdom)"], ["Tie\u2010Yan Liu", "Microsoft Research (United Kingdom)"]]
arXiv (Cornell University)
In this work, we investigate the positional encoding methods used in language pre-training (e.g., BERT) and identify several problems in the existing formulations. First, we show that in the absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlatio...
false
3
The paper improves algorithmic efficiency and architectural design by decoupling positional and word embeddings, reducing noise and enhancing model expressiveness. While it advances model design and training efficiency, it does not address data processing, selection, or hardware co-design, limiting its direct relevance...
{"algorithmic_efficiency": "Separates word and position correlations to reduce noise", "architectural_design": "Introduces TUPE architecture with untied positional encoding", "data_processing_selection": "None", "hardware_optimization": "None"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.008879571221768856, -0.0003273600013926625, -0.024976974353194237, 0.014783686958253384, 0.044173967093229294, 0.003709569573402405, 0.006877189036458731, -0.0342818908393383, 0.010974491946399212, 0.006727635394781828, -0.05852049961686134, -0.0112210838124156, -0.025299830362200737, -0.00130534637719...
{"umap_x": 1.1456602811813354, "umap_y": 3.5937225818634033}
rethinking_positional_encoding_in_language_pretraining
https://arxiv.org/pdf/2006.15595
Overview and Discussion of the Competition on Legal Information Extraction/Entailment (COLIEE) 2021
2,022
[["Juliano Rabelo", "University of Alberta"], ["Randy Goebel", "University of Alberta"], ["Miyoung Kim", "University of Alberta"], ["Yoshinobu Kano", "Shizuoka University"], ["Masaharu Yoshioka", "Hokkaido University"], ["Ken Satoh", "National Institute of Informatics"]]
The Review of Socionetwork Strategies
Abstract We summarize the 8th Competition on Legal Information Extraction and Entailment. In this edition, the competition included five tasks on case law and statute law. The case law component includes an information retrieval Task (Task 1), and the confirmation of an entailment relation between an existing case and ...
false
1
The abstract describes a legal information extraction competition with tasks on retrieval and entailment but does not mention algorithmic efficiency, architectural design, data processing, or hardware optimization. It focuses on competition structure, task definitions, and submission results, with no discussion of effi...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.006417465396225452, 0.010493048466742039, 0.011246011592447758, 0.048855435103178024, 0.009128862991929054, 0.03295442834496498, 0.015573103912174702, -0.07034562528133392, 0.017373772338032722, -0.005488635040819645, -0.007009129971265793, 0.012384048663079739, -0.008963610976934433, 0.00697545474395...
{"umap_x": 0.602614164352417, "umap_y": 1.661129117012024}
overview_and_discussion_of_the_competition_on_legal_information_extractionentailment_coliee_2021
https://link.springer.com/content/pdf/10.1007/s12626-022-00105-z.pdf
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
2,023
[["Yuyan Chen", "Fudan University"], ["Qiang Fu", "Microsoft Research Asia (China)"], ["Yichen Yuan", ""], ["Zhihao Wen", "Singapore Management University"], ["Ge Fan", "Tencent (China)"], ["Dayiheng Liu", ""], ["Dongmei Zhang", ""], ["Zhixu Li", "Fudan University"], ["Yanghua Xiao", "Fudan University"]]
Large language models (LLMs) have gained widespread adoption in various natural language processing tasks, including question answering and dialogue systems. However, a major drawback of LLMs is the issue of hallucination, where they generate unfaithful or inconsistent content that deviates from the input source, leadi...
false
1
The paper focuses on hallucination detection using a discriminator model trained on synthetic datasets, with no emphasis on algorithmic efficiency, architectural design, data processing, or hardware optimization. It is relevant to LLM reliability but does not address the research question's core concerns around efficie...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.007299056276679039, 0.01004844345152378, -0.014094232581555843, 0.027852075174450874, 0.04257062450051308, 0.019266195595264435, -0.012149098329246044, -0.026364734396338463, 0.048224806785583496, 0.013509436510503292, -0.03825647756457329, 0.020841609686613083, 0.033062923699617386, -0.020468423143029...
{"umap_x": -0.37184613943099976, "umap_y": 2.561249256134033}
hallucination_detection_robustly_discerning_reliable_answers_in_large_language_models
null
FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning
2,022
[["Jing Zhou", "Tsinghua University"], ["Yanan Zheng", "Beijing Academy of Artificial Intelligence"], ["Jie Tang", "Beijing Academy of Artificial Intelligence"], ["Jian Li", "Tsinghua University"], ["Zhilin Yang", "Beijing Academy of Artificial Intelligence"]]
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Most previous methods for text data augmentation are limited to simple tasks and weak baselines. We explore data augmentation on hard tasks (i.e., few-shot natural language understanding) and strong baselines (i.e., pretrained models with over one billion parameters). Under this setting, we reproduced a large number of...
false
3
The paper addresses data processing selection by introducing FlipDA, a novel method for generating label-flipped data that improves few-shot learning performance. It directly contributes to efficient data augmentation strategies, reducing reliance on costly or ineffective augmentation techniques, thus aligning with the...
{"algorithmic_efficiency": "Not applicable", "architectural_design": "Not applicable", "data_processing_selection": "Proposes label-flipped data generation for few-shot learning", "hardware_optimization": "Not applicable"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.011422394774854183, 0.05543661117553711, 0.014270022511482239, -0.0019042122876271605, 0.02294972911477089, 0.02017134241759777, 0.00040233039180748165, 0.016044598072767258, 0.02138257585465908, 0.02908994071185589, -0.02619410865008831, 0.014897292479872704, 0.0005134834791533649, -0.0354761257767677...
{"umap_x": 1.6002916097640991, "umap_y": 1.776157021522522}
flipda_effective_and_robust_data_augmentation_for_fewshot_learning
https://aclanthology.org/2022.acl-long.592.pdf
MisRoBÆRTa: Transformers versus Misinformation
2,022
[["Ciprian\u2010Octavian Truic\u0103", "Universitatea Na\u021bional\u0103 de \u0218tiin\u021b\u0103 \u0219i Tehnologie Politehnica Bucure\u0219ti"], ["Elena\u2010Simona Apostol", "Universitatea Na\u021bional\u0103 de \u0218tiin\u021b\u0103 \u0219i Tehnologie Politehnica Bucure\u0219ti"]]
Mathematics
Misinformation is considered a threat to our democratic values and principles. The spread of such content on social media polarizes society and undermines public discourse by distorting public perceptions and generating social unrest while lacking the rigor of traditional journalism. Transformers and transfer learning ...
false
3
The paper addresses algorithmic efficiency through DistilRoBERTa, a lightweight variant enabling faster training. Architectural design involves combining two transformers for enhanced performance. Data processing uses a large, carefully curated dataset. However, hardware optimization is absent. This work directly contr...
{"algorithmic_efficiency": "DistilRoBERTa used for faster training and inference", "architectural_design": "MisRoB\u00c6RTa combines BART and RoBERTa for improved detection", "data_processing_selection": "Larger dataset (100k records) with manual labeling for accuracy", "hardware_optimization": "NONE"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.025914661586284637, 0.036560431122779846, 0.0005638849106617272, 0.028658084571361542, 0.015198864042758942, 0.008483489975333214, -0.007737389765679836, -0.02971881628036499, -0.0010318084387108684, -0.003915903624147177, 0.019827067852020264, 0.03691563382744789, 0.0038963777478784323, -0.0018976198...
{"umap_x": -1.183401107788086, "umap_y": 4.681253910064697}
misrobærta_transformers_versus_misinformation
https://www.mdpi.com/2227-7390/10/4/569/pdf?version=1645069852
AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories
2,024
[["Max Pellert", "University of Mannheim"], ["Clemens M. Lechner", "GESIS - Leibniz-Institute for the Social Sciences"], ["Claudia Wagner", "Complexity Science Hub Vienna"], ["Beatrice Rammstedt", "GESIS - Leibniz-Institute for the Social Sciences"], ["Markus Strohmaier", "Complexity Science Hub Vienna"]]
Perspectives on Psychological Science
We illustrate how standard psychometric inventories originally designed for assessing noncognitive human traits can be repurposed as diagnostic tools to evaluate analogous traits in large language models (LLMs). We start from the assumption that LLMs, inadvertently yet inevitably, acquire psychological traits (metaphor...
false
1
The abstract does not discuss algorithmic efficiency, architectural design, data processing/selection, or hardware optimization. Instead, it focuses on evaluating LLMs' psychological traits via psychometric inventories, which is unrelated to efficiency or accessibility for trainers and deployers. This paper is thematic...
{"algorithmic_efficiency": "not addressed", "architectural_design": "not addressed", "data_processing_selection": "not addressed", "hardware_optimization": "not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.00583705073222518, 0.008296000771224499, -0.03690626844763756, 0.00822544563561678, 0.0063613057136535645, 0.01388748548924923, 0.02741893380880356, -0.019468063488602638, -0.018291017040610313, -0.016214432194828987, -0.05289950594305992, 0.0025878483429551125, 0.013202909380197525, -0.04791624844074...
{"umap_x": -1.3543967008590698, "umap_y": 3.5318751335144043}
ai_psychometrics_assessing_the_psychological_profiles_of_large_language_models_through_psychometric_inventories
https://journals.sagepub.com/doi/pdf/10.1177/17456916231214460
Metaphorian: Leveraging Large Language Models to Support Extended Metaphor Creation for Science Writing
2,023
[["Jeongyeon Kim", "Stanford University"], ["Sangho Suh", "University of California, San Diego"], ["Lydia B. Chilton", "Columbia University"], ["Haijun Xia", "University of California, San Diego"]]
Science writers commonly use extended metaphors to communicate unfamiliar concepts in a more accessible way to a wider audience. However, creating metaphors for science writing is challenging even for professional writers; according to our formative study (n=6), finding inspiration and extending metaphors with coherent...
false
1
The abstract does not address algorithmic efficiency, architectural design, data processing or selection, or hardware optimization. It focuses on supporting metaphor creation in science writing using LLMs, with no mention of efficiency or optimization for training/inference or deployment. The paper contributes to creat...
{"algorithmic_efficiency": "Not applicable", "architectural_design": "Not applicable", "data_processing_selection": "Not applicable", "hardware_optimization": "Not applicable"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.004506122320890427, -7.042632205411792e-05, -0.03519697114825249, 0.000827884185127914, -0.0016674867365509272, -0.015847928822040558, -0.030832162126898766, -0.03627129644155502, 0.005241787526756525, -0.014839858748018742, -0.035694632679224014, 0.030167439952492714, 0.01814304105937481, -0.03708535...
{"umap_x": -0.4504256248474121, "umap_y": 3.6390228271484375}
metaphorian_leveraging_large_language_models_to_support_extended_metaphor_creation_for_science_writing
https://dl.acm.org/doi/pdf/10.1145/3563657.3595996
Domain-matched Pre-training Tasks for Dense Retrieval
2,022
[["Barlas O\u011fuz", ""], ["Kushal Lakhotia", ""], ["Anchit Gupta", ""], ["Patrick Lewis", ""], ["Vladimir Karpukhin", ""], ["Aleksandra Piktus", ""], ["Xilun Chen", ""], ["Sebastian Riedel", ""], ["Scott Yih", ""], ["Sonal Gupta", ""], ["Yashar Mehdad", ""]]
Findings of the Association for Computational Linguistics: NAACL 2022
Barlas Oguz, Kushal Lakhotia, Anchit Gupta, Patrick Lewis, Vladimir Karpukhin, Aleksandra Piktus, Xilun Chen, Sebastian Riedel, Scott Yih, Sonal Gupta, Yashar Mehdad. Findings of the Association for Computational Linguistics: NAACL 2022. 2022.
false
3
The paper improves efficiency through domain-matched pre-training tasks that reduce data needs and computational load, aligning with algorithmic efficiency and data processing selection. It contributes to making dense retrieval more accessible by optimizing training via targeted tasks, thus enhancing efficiency for tra...
{"algorithmic_efficiency": "Focus on task-specific pre-training for efficiency", "architectural_design": "No novel architecture introduced", "data_processing_selection": "Domain-matched tasks reduce data burden", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.01217617653310299, 0.024933915585279465, 0.01707259751856327, 0.008290918543934822, 0.009074531495571136, 0.011485516093671322, 0.012136830948293209, -0.07165570557117462, 0.0223823394626379, -0.0071282899007201195, -0.029115818440914154, 0.0028870897367596626, 0.029613742604851723, 0.0006364234141074...
{"umap_x": 1.3987603187561035, "umap_y": 2.3331382274627686}
domainmatched_pretraining_tasks_for_dense_retrieval
https://aclanthology.org/2022.findings-naacl.114.pdf
ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention
2,024
[["Mingchen Li", "Shanghai Artificial Intelligence Laboratory"], ["Pan Tan", "Shanghai Jiao Tong University"], ["Xinzhu Ma", "Shanghai Artificial Intelligence Laboratory"], ["Bozitao Zhong", "Shanghai Jiao Tong University"], ["Huiqun Yu", "East China University of Science and Technology"], ["Ziyi Zhou", "Shanghai Jiao ...
bioRxiv (Cold Spring Harbor Laboratory)
Abstract Protein language models (PLMs) have shown remarkable capabilities in various protein function prediction tasks. However, while protein function is intricately tied to structure, most existing PLMs do not incorporate protein structure information. To address this issue, we introduce ProSST, a Transformer-based ...
false
4
The paper enhances algorithmic efficiency through structure quantization and improves architectural design via disentangled attention. It addresses protein modeling with efficient representation learning but does not involve data selection strategies or hardware co-design. Overall, it contributes to AI efficiency in bi...
{"algorithmic_efficiency": "Includes structure quantization for reduced computational cost", "architectural_design": "Uses disentangled attention for efficient sequence-structure modeling", "data_processing_selection": "None explicitly mentioned; relies on pre-existing structure data", "hardware_optimization": "None ad...
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.0030241510830819607, -0.014192852191627026, 0.0030177412554621696, 0.01471181120723486, 0.0054884785786271095, 0.020192909985780716, -0.011094696819782257, -0.009574761614203453, -0.016773032024502754, -0.011261426843702793, -0.015257734805345535, 0.0361027829349041, 0.009924027137458324, -0.0375433377...
{"umap_x": 1.6513904333114624, "umap_y": 3.619230031967163}
prosst_protein_language_modeling_with_quantized_structure_and_disentangled_attention
https://www.biorxiv.org/content/biorxiv/early/2024/04/17/2024.04.15.589672.full.pdf
The perils and promises of fact-checking with large language models
2,024
[["Dorian Quelle", "University of Zurich"], ["Alexandre Bovet", "University of Zurich"]]
Frontiers in Artificial Intelligence
Automated fact-checking, using machine learning to verify claims, has grown vital as misinformation spreads beyond human fact-checking capacity. Large language models (LLMs) like GPT-4 are increasingly trusted to write academic papers, lawsuits, and news articles and to verify information, emphasizing their role in dis...
false
1
The abstract does not discuss algorithmic efficiency, architectural design, data processing/selection, or hardware optimization. It focuses on LLM capabilities in fact-checking, reasoning, and source citation, which are outside the scope of AI efficiency or deployability concerns for trainers or deployers.
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.0012512452667579055, 0.02730763517320156, 0.013868384063243866, 0.0189997386187315, 0.002599006984382868, 0.05265124514698982, -0.005658098962157965, -0.04347313567996025, 0.0032116996590048075, -0.0024067300837486982, -0.016676941886544228, 0.004297120030969381, 0.00953723955899477, -0.008191094733774...
{"umap_x": -0.24826078116893768, "umap_y": 2.094883680343628}
the_perils_and_promises_of_factchecking_with_large_language_models
https://www.frontiersin.org/articles/10.3389/frai.2024.1341697/pdf?isPublishedV2=False
Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents
2,024
[["Sergio Pelaez", "Georgia Institute of Technology"], ["Gaurav Verma", "Georgia Institute of Technology"], ["B\u00e1rbara Ribeiro", "SKEMA Business School"], ["Philip Shapira", "University of Manchester"]]
Quantitative Science Studies
Abstract We put forward a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. The approach is used to discover public value expressions in patents. Using text (5.4 million sentences) for 154,934 US AI patent documents from the United States Patent and...
false
3
The paper addresses algorithmic efficiency through prompt-based automation reducing manual labeling effort; architectural design involves BERT-based classification for performance; data processing uses large-scale text with curated prompts for selection. While not focused on hardware, it contributes to efficient AI dep...
{"algorithmic_efficiency": "Leverages GPT-4 for automated labeling with minimal computational overhead", "architectural_design": "Uses BERT-based classifiers trained on GPT-4-generated labels for downstream tasks", "data_processing_selection": "Processes 5.4M sentences from AI patents using semi-automated prompt-driven...
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.00592697411775589, 0.004875773563981056, -3.652020495792385e-06, 0.005924909375607967, 0.030092110857367516, 0.0060779801569879055, 0.003985063172876835, -0.030292291194200516, -0.012391828931868076, -0.029933612793684006, -0.006823635660111904, 0.003076930297538638, 0.007882999256253242, -0.0276225320...
{"umap_x": -0.7747442126274109, "umap_y": 3.2773759365081787}
largescale_text_analysis_using_generative_language_models_a_case_study_in_discovering_public_value_expressions_in_ai_patents
https://direct.mit.edu/qss/article-pdf/doi/10.1162/qss_a_00285/2325312/qss_a_00285.pdf
Large Language Models, Agency, and Why Speech Acts are Beyond Them (For Now) – A Kantian-Cum-Pragmatist Case
2,024
[["Reto Gubelmann", "University of St.Gallen"]]
Philosophy & Technology
Abstract This article sets in with the question whether current or foreseeable transformer-based large language models (LLMs), such as the ones powering OpenAI’s ChatGPT, could be language users in a way comparable to humans. It answers the question negatively, presenting the following argument. Apart from niche uses, ...
false
1
The abstract does not discuss algorithmic efficiency, architectural design, data processing, or hardware optimization. It focuses on philosophical arguments about agency and intentionality in LLMs, not technical improvements for efficiency or accessibility. Thus, it is irrelevant to the research question about AI effic...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.021508637815713882, 0.027501096948981285, -0.008208287879824638, 0.01283835805952549, -0.002055569551885128, 0.00970974937081337, -0.04114846512675285, -0.03293260559439659, -0.02944047935307026, 0.006984157022088766, -0.02816951274871826, -0.003195451572537422, -0.0005917831440456212, 0.00377036374993...
{"umap_x": -0.8092983365058899, "umap_y": 2.075796604156494}
large_language_models_agency_and_why_speech_acts_are_beyond_them_for_now_–_a_kantiancumpragmatist_case
https://link.springer.com/content/pdf/10.1007/s13347-024-00696-1.pdf
ShuttleNet: Position-Aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton
2,022
[["Wei\u2010Yao Wang", "National Yang Ming Chiao Tung University"], ["Hong-Han Shuai", "National Yang Ming Chiao Tung University"], ["Kai-Shiang Chang", "National Yang Ming Chiao Tung University"], ["Wen-Chih Peng", "National Yang Ming Chiao Tung University"]]
Proceedings of the AAAI Conference on Artificial Intelligence
The increasing demand for analyzing the insights in sports has stimulated a line of productive studies from a variety of perspectives, e.g., health state monitoring, outcome prediction. In this paper, we focus on objectively judging what and where to return strokes, which is still unexplored in turn-based sports. By fo...
false
2
The paper introduces a novel architectural design focused on position-aware fusion for stroke forecasting in badminton, but does not address algorithmic efficiency, data processing, or hardware co-design. While it improves model performance through architectural innovation, it lacks explicit optimizations for efficienc...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Position-aware fusion with encoder-decoder", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.002738985000178218, 0.019920438528060913, -0.014743691310286522, 0.015218502841889858, 0.030754126608371735, -0.005747226066887379, 0.014209982939064503, -0.018920795992016792, 0.010034826584160328, -0.007855110801756382, -0.04019472002983093, -0.0014688376104459167, -0.017337141558527946, 0.005074150...
{"umap_x": 1.868045687675476, "umap_y": 4.391332149505615}
shuttlenet_positionaware_fusion_of_rally_progress_and_player_styles_for_stroke_forecasting_in_badminton
https://ojs.aaai.org/index.php/AAAI/article/download/20341/20100
Supervising Model Attention with Human Explanations for Robust Natural Language Inference
2,022
[["Joe Stacey", "Imperial College London"], ["Yonatan Belinkov", "Technion \u2013 Israel Institute of Technology"], ["Marek Rei", "Imperial College London"]]
Proceedings of the AAAI Conference on Artificial Intelligence
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their training data, impacting how well they generalise to other unseen datasets. Existing de-biasing approaches focus on preventing the models from learning these biases, which can result in restrictive models and lower perform...
false
2
The paper enhances model generalisation through human-explanation-supervised attention, improving focus on relevant words and reducing reliance on biased data patterns. While it contributes to robustness and generalisation in NLI, it does not address algorithmic efficiency, architectural design, data processing, or har...
{"algorithmic_efficiency": "Attention supervision improves generalisation", "architectural_design": "No novel architecture proposed", "data_processing_selection": "No data selection or synthesis strategy", "hardware_optimization": "No hardware co-design or optimization"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.022161465138196945, 0.0160599984228611, -0.03451065346598625, 0.012909555807709694, -0.00516332034021616, 0.027515575289726257, -0.05055977776646614, -0.05299412086606026, -0.02921631745994091, 0.00928509421646595, -0.012959280982613564, 0.015668943524360657, -0.012826543301343918, -0.0149529995396733...
{"umap_x": 0.7219666838645935, "umap_y": 1.562085509300232}
supervising_model_attention_with_human_explanations_for_robust_natural_language_inference
https://ojs.aaai.org/index.php/AAAI/article/download/21386/21135
LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification
2,022
[["Jiangjie Chen", "Fudan University"], ["Qiaoben Bao", "Fudan University"], ["Changzhi Sun", ""], ["Xinbo Zhang", ""], ["Jiaze Chen", ""], ["Hao Zhou", ""], ["Yanghua Xiao", "Fudan University"], ["Lei Li", "University of California, Santa Barbara"]]
Proceedings of the AAAI Conference on Artificial Intelligence
Given a natural language statement, how to verify its veracity against a large-scale textual knowledge source like Wikipedia? Most existing neural models make predictions without giving clues about which part of a false claim goes wrong. In this paper, we propose LOREN, an approach for interpretable fact verification. ...
false
1
The abstract does not discuss algorithmic efficiency, architectural design, data processing or hardware optimization. It focuses solely on interpretability in fact verification via logical regularization at the phrase level, which relates to explainability rather than efficiency or accessibility for trainers or deploye...
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.005086598917841911, 0.024319414049386978, 0.004899897612631321, -0.005469687283039093, 0.00039128822390921414, 0.019893966615200043, 0.001100019202567637, -0.03398891165852547, -0.021074313670396805, 0.008808120153844357, 0.012545563280582428, 0.03464466705918312, -0.005266675725579262, -0.01527217775...
{"umap_x": 0.18589679896831512, "umap_y": 1.6079062223434448}
loren_logicregularized_reasoning_for_interpretable_fact_verification
https://ojs.aaai.org/index.php/AAAI/article/download/21291/21040
Performance-Efficiency Trade-Offs in Unsupervised Pre-Training for Speech Recognition
2,022
[["Felix Wu", ""], ["Kwangyoun Kim", ""], ["Jing Pan", ""], ["Kyu J. Han", ""], ["Kilian Q. Weinberger", "Cornell University"], ["Yoav Artzi", "Cornell University"]]
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW-D (Squeezed a...
false
4
The paper addresses algorithmic efficiency and architectural design by proposing SEW-D, a streamlined wav2vec 2.0 variant that improves both inference speed and accuracy. It is highly relevant to the research question by offering a model that reduces computational cost while maintaining or enhancing performance, direct...
{"algorithmic_efficiency": "Improves inference speed via efficient architecture", "architectural_design": "Introduces SEW-D with disentangled attention for efficiency", "data_processing_selection": "NONE", "hardware_optimization": "NONE"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [0.017381926998496056, 0.014231828041374683, -0.015020892955362797, 0.014188352972269058, 0.025755994021892548, -0.0032366113737225533, -0.0067627886310219765, -0.03009369596838951, -0.019591063261032104, 0.022525573149323463, -0.02917603962123394, 0.024912983179092407, -0.007595833856612444, -0.024060290...
{"umap_x": 2.3966407775878906, "umap_y": 3.610037088394165}
performanceefficiency_tradeoffs_in_unsupervised_pretraining_for_speech_recognition
null
i-Code: An Integrative and Composable Multimodal Learning Framework
2,023
[["Ziyi Yang", "Microsoft Research (United Kingdom)"], ["Yuwei Fang", "Microsoft Research (United Kingdom)"], ["Chenguang Zhu", "Microsoft Research (United Kingdom)"], ["Reid Pryzant", "Microsoft Research (United Kingdom)"], ["Dongdong Chen", "Microsoft Research (United Kingdom)"], ["Yu Shi", "Microsoft Research (Unite...
Proceedings of the AAAI Conference on Artificial Intelligence
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision,...
false
3
The paper enhances algorithmic efficiency through novel attention mechanisms and efficient modality integration. Its architectural design enables flexible, scalable multimodal processing. Data processing is adaptive across modalities without selective filtering or synthesis. While it advances multimodal learning, it do...
{"algorithmic_efficiency": "Uses co-attention and masked modeling for efficient integration", "architectural_design": "Introduces multimodal fusion via merge-coattention mechanisms", "data_processing_selection": "Processes single, dual, and triple modality inputs flexibly", "hardware_optimization": "NONE"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.003666869131848216, 0.015671158209443092, -0.00237352610565722, -0.023600082844495773, 0.010085460729897022, 0.0037898400332778692, -0.0036690719425678253, -0.00832286011427641, 0.0056694066151976585, 0.01294013112783432, 0.009430120699107647, 0.0154147082939744, 0.003266164567321539, -0.0216760300099...
{"umap_x": 0.31780409812927246, "umap_y": 4.945220947265625}
icode_an_integrative_and_composable_multimodal_learning_framework
https://ojs.aaai.org/index.php/AAAI/article/download/26290/26062
Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective
2,023
[["Adaku Uchendu", "Pennsylvania State University"], ["Thai Le", "University of Mississippi"], ["Dongwon Lee", "Pennsylvania State University"]]
ACM SIGKDD Explorations Newsletter
Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solut...
false
1
The abstract does not discuss algorithmic efficiency, architectural design, data processing or selection, or hardware optimization. It focuses on authorship attribution and obfuscation in neural text generation, which is unrelated to AI efficiency or accessibility for trainers and deployers.
{"algorithmic_efficiency": "Not addressed", "architectural_design": "Not addressed", "data_processing_selection": "Not addressed", "hardware_optimization": "Not addressed"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.02400008775293827, 0.028910644352436066, 0.0015737270005047321, 0.023309772834181786, -0.002934356452897191, -0.00957245472818613, 0.024075409397482872, -0.012827705591917038, 0.0131770558655262, -0.004058739636093378, 0.014941973611712456, 0.018898537382483482, 0.017127925530076027, -0.00922052562236...
{"umap_x": 0.222743421792984, "umap_y": 3.8580453395843506}
attribution_and_obfuscation_of_neural_text_authorship_a_data_mining_perspective
null
DISCO: Distilling Counterfactuals with Large Language Models
2,023
[["Zeming Chen", "\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne"], ["Qiyue Gao", "Allen Institute"], ["Antoine Bosselut", "\u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne"], ["Ashish Sabharwal", "Allen Institute"], ["Kyle Richardson", "Allen Institute"]]
Models trained with counterfactually augmented data learn representations of the causal structure of tasks, enabling robust generalization. However, high-quality counterfactual data is scarce for most tasks and not easily generated at scale. When crowdsourced, such data is typically limited in scale and diversity; when...
false
3
DISCO improves algorithmic efficiency and data processing selection by distilling high-quality counterfactual data at scale, enabling smaller, more robust models. While it enhances model efficiency through data augmentation and knowledge distillation, it does not directly address hardware co-design or novel architectur...
{"algorithmic_efficiency": "Uses distillation to reduce training cost", "architectural_design": "Applies distillation to lightweight student models", "data_processing_selection": "Generates scalable counterfactual data via prompt engineering and filtering", "hardware_optimization": "NONE"}
{"references": [], "citations": ["lora_lowrank_adaptation_of_large_language_models"]}
{"embedding": [-0.014818478375673294, 0.008917762897908688, -0.028475536033511162, 0.02041071467101574, 0.030975036323070526, 0.02635367587208748, -0.023850150406360626, -0.03273388370871544, -0.011569029651582241, 0.004305985290557146, -0.02641247771680355, 0.0056459978222846985, -0.003465550020337105, -0.008733628317...
{"umap_x": 1.3048679828643799, "umap_y": 0.7485260963439941}
disco_distilling_counterfactuals_with_large_language_models
https://aclanthology.org/2023.acl-long.302.pdf