Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Significant scholarly effort has been dedicated to defining the rule of law. The prevailing view in the literature is that the rule of law is a highly multidimensional and, as some suggest, an essentially contested concept. In this study, we employ advanced text-as-data methods, specifically diachronic word embeddings, to shed light on what the rule of law means and how its meaning has evolved over a century through parliamentary speeches in the UK and the US. We categorize the conceptualization of the rule of law into thin (procedural) and thick (substantive) definitions. Our findings indicate that procedural elements, such as rules and judiciary, maintain a strong and relatively more stable association with the rule of law. In contrast, substantive elements, which include rights and democratic principles, have become relatively less associated with the rule of law over time. Despite this decline, the rights component remains critically important to the concept, broadly equivalent in significance to procedural aspects. Because our analysis is confined to parliamentary debates from the UK and the US, the findings should be interpreted with caution when generalizing to other political contexts. PubDate: 2025-06-19
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The advancement of natural language processing (NLP) has expanded the application of AI-based text classification in the legal domain. However, accurately classifying legal documents remains a challenging task due to the complexity of legal texts and the subtle differences between legal categories. This study conducts a comprehensive evaluation of various legal text classification models, ranging from traditional machine learning techniques to state-of-the-art large language models (LLMs), based on ten legal categories of sexual offense precedents. The experimental results demonstrate that fine-tuning small-scale models (Small LMs) such as KLUE-BERT on legal data yields superior performance compared to large general-purpose models such as GPT-3.5 and GPT-4.0, as well as traditional machine learning models. In particular, KLUE-BERT achieved the highest accuracy of 99.3%, indicating that domain adaptation and fine-tuning play a more crucial role in legal document classification than model size alone. Furthermore, we employed explainable AI (XAI) techniques to analyze the model’s predictions and conduct an in-depth review of misclassification cases. XAI-based analysis allowed us to identify key linguistic features influencing model decisions and revealed limitations in the model’s ability to fully capture subtle textual cues. To further validate these findings, we leveraged KICS data, which closely resembles real-world legal case records, as a testbed to evaluate the model’s generalization capabilities. The results indicate that the model struggles to interpret implicit contextual cues within legal texts. These findings emphasize the necessity for both high performance and interpretability in legal AI models. By utilizing XAI, this study proposes methods to enhance transparency in legal text classification, contributing to ongoing discussions on the reliability and practical application of AI in legal contexts. Our research suggests that AI-assisted tools can effectively support legal professionals in tasks such as legal document classification, legal information retrieval, and case assessment. PubDate: 2025-05-28
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Binding precedents (súmulas vinculantes) constitute a juridical instrument unique to the Brazilian legal system and whose objectives include the protection of the Federal Supreme Court against repetitive demands. Studies of the effectiveness of these instruments in decreasing the Court’s exposure to similar cases, however, indicate that they tend to fail in such a direction, with some of the binding precedents seemingly creating new demands. We empirically assess the legal impact of five binding precedents, 11, 14, 17, 26, and 37, at the highest Court level through their effects on the legal subjects they address. This analysis is only possible through the comparison of the Court’s ruling about the precedents’ themes before they are created, which means that these decisions should be detected through techniques of Similar Case Retrieval, which we tackle from the angle of Case Classification. The contributions of this article are therefore twofold: on the mathematical side, we compare the use of different methods of Natural Language Processing — TF-IDF, LSTM, Longformer, and regex — for Case Classification, whereas on the legal side, we contrast the inefficiency of these binding precedents with a set of hypotheses that may justify their repeated usage. We observe that the TF-IDF models performed slightly better than LSTM and Longformer when compared through common metrics; however, the deep learning models were able to detect certain important legal events that TF-IDF missed. On the legal side, we argue that the reasons for binding precedents to fail in responding to repetitive demand are heterogeneous and case-dependent, making it impossible to single out a specific cause. We identify five main hypotheses, which are found in different combinations in each of the precedents studied. PubDate: 2025-05-26
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this article, we propose the R2GQA system, a retriever-reader-generator question answering system, consisting of three main components: Document Retriever, Machine Reader, and Answer Generator. The Document Retriever module employs advanced information retrieval techniques to extract the context of articles from a dataset of legal regulation documents. The Machine Reader module utilizes state-of-the-art natural language understanding algorithms to comprehend the retrieved documents and extract answers. Finally, the Generator module synthesizes the extracted answers into concise and informative responses to questions of students regarding legal regulations. Furthermore, we built the ViRHE4QA dataset in the domain of university training regulations, comprising 9,758 question-answer pairs with a rigorous construction process. This is the first Vietnamese dataset in the higher regulations domain with various types of answers, both extractive and abstractive. In addition, the R2GQA system is the first system to offer abstractive answers in Vietnamese. This paper discusses the design and implementation of each module within the R2GQA system on the ViRHE4QA dataset, highlighting their functionalities and interactions. Furthermore, we present experimental results demonstrating the effectiveness and utility of the proposed system in supporting the comprehension of students of legal regulations in higher education settings. In general, the R2GQA system and the ViRHE4QA dataset promise to contribute significantly to related research and help students navigate complex legal documents and regulations, empowering them to make informed decisions and adhere to institutional policies effectively. Our dataset is available for research purposes. PubDate: 2025-05-22
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Large language models (LLMs), trained on extensive datasets, demonstrate their exceptional ability to handle textual data to yield desired outputs solely through prompts and demonstrations, eliminating the need for extensive fine-tuning. Despite their potential to change lives, integrating LLMs into the legal domain poses challenges, necessitating high-level reasoning, efficient understanding of complex linguistic structures, and faithful reflection of legal precedents. Tailoring prompting paradigm LLMs through reasoning techniques to address legal nuances enhances their precision and relevance in generating legal text. Despite advancements in language processing, challenges persist, particularly the lack of domain-specific knowledge in LLMs. Approaches such as integrating external information retrieval systems and fine-tuning with domain-specific data aim to mitigate this gap. This research provides a comprehensive exploration of adaptive strategies aimed at increasing the reasoning capabilities of LLMs within the legal domain. The paper delves into various prompt engineering methodologies, approaches to incorporate external knowledge, and frameworks for evaluating LLM responses, with the aim of inspiring further research in prompt engineering specifically tailored for legal applications while addressing prevailing research challenges. PubDate: 2025-05-15
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The rise of large language models (LLMs) such as ChatGPT and GPT-4 developed by OpenAI have generated significant interest in the legal domain due to their sophisticated language processing capabilities. In particular, regions like China are vigorously developing legal-specific LLMs for legal purposes. Fine-tuned with fewer parameters and based on judicial documents and Chinese case data sets, these specialized LLMs are widely expected to meet practical needs in the judicial field more effectively. However, the ability of these law-specific LLMs to perform legal tasks and their potential to outperform general LLMs has not yet been established. To fill in this research gap, we systematically evaluate a range of general and legal-specific LLMs on various legal tasks. The results show that GPT-4 maintains superior performance on most legal tasks, although legal-specific LLMs show superior performance in specific cases. This study provides insight into the factors leading to these results, hoping to enrich the discourse on the use of LLMs in the legal field. PubDate: 2025-05-14
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Business contracts, particularly sale and purchase agreements, often contain a large number of clauses and are correspondingly long and complex. In practice, it is therefore a great challenge to keep track of their legal context and to identify and avoid inconsistencies in such contracts. Against this background, we describe a method and tool called ContractCheck which allows for the consistency analysis of legal contracts, in particular share purchase agreements (SPAs). In order to identify the concepts that are relevant for an analysis we define an ontology for SPAs. The analysis is, then, based on an encoding of the preconditions for the execution of the clauses of an SPA, as well as on a set of proposed consistency constraints formalized using decidable fragments of first-order logic (FOL). Based on the ontology for SPAs, textual SPAs are first encoded in a structured natural language format that we refer to as “blocks”. ContractCheck interprets these blocks and constraints and translates them into assertions formulated in FOL. It then invokes a Satisfiability Modulo Theory (SMT) solver in order to check the executability of a considered contract, either by providing a satisfying model, or by proving the existence of conflicting clauses that prevent the contract from being executed. We illustrate the application of ContractCheck to concrete SPAs, including one example of an SPA of realistic size and complexity, and conclude by suggesting directions for future research. PubDate: 2025-05-13
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This study addresses the growing concern about the inclusion of abusive clauses in consumer contracts, exacerbated by the proliferation of online services with complex Terms of Service that are rarely read. Even though research on automatic analysis methods is conducted, the difficulty of detecting such clauses is aggravated by the general focus on English-language Machine Learning approaches and on major jurisdictions, such as the European Union. We introduce a new methodology and a substantial Spanish-language dataset addressing this gap. We propose a novel annotation scheme with four categories and 20 classes and apply it to 50 online Terms of Service used in Chile. Our evaluation of transformer-based models highlights how factors like language- and/or domain-specific pre-training, few-shot sample size, and model architecture affect the detection and classification of potentially abusive clauses. Results show a large variability in performance for the different tasks and models, with the highest macro-F1 scores for the detection task ranging from 79% to 89% and micro-F1 scores up to 96%, while macro-F1 scores for the classification task range from 60% to 70% and micro-F1 scores from 64% to 80%. Notably, this is the first Spanish-language multi-label classification dataset for legal clauses, applying Chilean law and offering a comprehensive evaluation of Spanish-language models in the legal domain. Our work lays the ground for future research in method development for rarely considered legal analysis and potentially leads to practical applications to support consumers in Chile and Latin America as a whole. PubDate: 2025-05-09
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Open-texture—e.g. vague, ambiguous, under-specified, or abstract terms—in regulatory documents lead to inconsistent interpretation, and are an obstacle to the automatic processing of regulation by computers. Identifying which parts of a legal text fall under open-texture is therefore a necessary requirement to make progress in automating the law. In this paper, we propose that large language models (LLMs) might provide an effective way to automatically detect open-texture in legal texts. We first investigate the obstacles by situating open-texture in the broader literature, and we test the hypothesis using two different LLMs—the proprietary gpt-3.5-turbo and the open-source llama-2-70b-chat—for the task of identifying open-texture in the General Data Protection Regulation. We evaluate their performance by asking 12 annotators to assess their output. We find, overall, that gpt-3.5-turbo overperforms llama-2-70b-chat on F1-scores (0.84 vs 0.67), and its high F1-score could make it a suitable alternative, or complement, to using human annotators. We also test the sensitivity of the findings against four further LLMs combined with six different prompts, and replicate a finding that there is low agreement between annotators when it comes to the identification of open-texture. We conclude the article by discussing the subjectivity of open-texture, the lessons to draw when testing for open-texture, and the consequences of using LLMs in the legal domain. PubDate: 2025-05-06
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Extractive summaries of legal judgments are invaluable due to their ability to preserve the original text, facilitating direct reference when required. Existing research on extractive summarization of Indian legal judgments predominantly employs supervised methodologies, which rely on extensive and often difficult-to-obtaining expert annotations. State-of-the-art methods that use role labels or rhetorical role labels are largely limited to supervised learning paradigms (Bhattacharya et al. in Identification of rhetorical roles of sentences in indian legal judgments, 2019). In this study, we propose an unsupervised approach, Unsupervised Role-Labelled Knapsack Summarizer (URL KnapSum), which eliminates the need for expert annotations while effectively scaling to larger datasets. Our methodology begins by clustering sentences to group them into thematic categories. An optimized selection of sentences is performed using a knapsack algorithm using the similarity scores and length constraints, ensuring cluster-level diversity and global coherence. We evaluate the URL KnapSum using ROUGE (Lin in ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out, Barcelona, Spain. Association for Computational Linguistics, pp 74–81, 2004) scores for supervised evaluation and Kendall’s tau (Puka in Kendall’s Tau. Springer, Berlin, Heidelberg, pp 713–715, 2011) and Spearman’s correlation (Sedgwick in BMJ: Br Med J 349:g7327, 2014) for unsupervised evaluation. Additionally, we introduce a novel, reference-free evaluation metric, the Top-K Analysis Metric, which benchmarks the algorithm by measuring consistency between document and summary similarities. The results demonstrate the superiority of URL KnapSum over existing supervised methods, highlighting its effectiveness and scalability. PubDate: 2025-05-06
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Studies on the evidential foundations of probabilistic reasoning are extended using the notion of weight of evidence to measure evidential phenomena in the presence of a mass of evidence giving rise to complex reasoning patterns. The main results provided in this paper are methods to measure inferential interactions and dissonances among items of evidence. All measures are defined to ensure a versatile applicability in inferential tasks involving the combination of evidence and a mass of evidence. These measures enable a detailed examination of recurrent phenomena in evidence-based reasoning, such as convergence, contradiction, redundancy, and synergy. Most of these phenomena have—as far as the authors are aware—either not been formally described or any formal description proposed has been of limited use for evidence-based reasoning tasks. The present research addresses this deficit in the current understanding and treatment of these evidential phenomena. It is shown by way of examples that incorrect consideration of evidential phenomena can lead to substantial misrepresentations of the value of evidence. PubDate: 2025-05-03
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In a world of human-only readers, a trade-off persists between comprehensiveness and comprehensibility: only privacy policies too long to be humanly readable can precisely describe the intended data processing. We argue that this trade-off no longer exists where LLMs are able to extract tailored information from clearly-drafted fully-comprehensive privacy policies. To substantiate this claim, we provide a methodology for drafting comprehensive non-ambiguous privacy policies and for querying them using LLMs prompts. Our methodology is tested with an experiment aimed at determining to what extent GPT-4 and Llama2 are able to answer questions regarding the content of privacy policies designed in the format we propose. We further support this claim by analyzing real privacy policies in the chosen market sectors through two experiments (one with legal experts, and another by using LLMs). Based on the success of our experiments, we submit that data protection law should change: it must require controllers to provide clearly drafted, fully comprehensive privacy policies from which data subjects and other actors can extract the needed information, with the help of LLMs. PubDate: 2025-04-28
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The automatic summarization of judgment documents is a challenging task due to their length and the dispersed nature of the important information they contain. The prevailing approach to tackling the summarization of lengthy documents involves the integration of both extractive and abstractive summarization models. However, current extractive models face challenges in capturing all essential details due to the scattered distribution of pertinent information within judgment documents. Additionally, the existing abstractive models still grapple with the problem of "hallucinations" which leads to generating inaccurate information. In our work, we proposed a novel hybrid legal summarization method that incorporates legal domain knowledge into both the extractive model and abstractive model. The method consists of two parts: (1) The rhetorical role of sentences is identified by the sentence-level sequence labeling method, and the rhetorical information is integrated into the extractive model based on WoBERT through the conditional normalization to ensure that the identification of key sentences is both precise and complete. (2) The pre-trained model RoFormer is combined with Seq2Seq to construct a long text summarization model, and the prior knowledge in the external resources and the document itself is introduced into the decoding process to improve the faithfulness and coherence of the composed summary. In addition, the contrastive learning strategy is employed during the training process to enhance the robustness of the abstractive model. Experimental results on the CAIL2020 dataset show that the proposed model is superior to the baseline methods. Furthermore, our method outperforms GPT and other LLMs in processing judgment documents. PubDate: 2025-04-22
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We propose the task of legal question generation (QG) as an application in Legal NLP. Specifically, the task is to generate a question, given a context and an optional keyword. We create the first dataset for the QG task in the legal domain, called LegalQ, consisting of 2023 pairs spanning the legal systems of multiple countries, and multiple languages. We then use this dataset to benchmark several Large Language Models (LLMs) including Turbo-GPT-3.5, GPT-4, Llama2-70b, Llama2-13b, and Aalap-Mistral-7b (a legal domain-specific LLM). We also fine-tune several open-source LLMs such as T5, BART, Pegasus, and Flan-T5, which helps to improve results over zero-shot prompting of LLMs. We also use the idea of in-context learning (via few-shot examples) to generate questions of varying types and difficulty levels. Furthermore, we introduce a novel domain-specific prompting strategy based on chain-of-thoughts prompting for question generation. Further, we perform Bloom Taxonomy analysis of the questions generated by the LLMs, thereby showing that ‘understanding’ and ‘remembering’ are the two most dominant types of questions generated by the LLMs. Human evaluation of the generated questions shows promise in terms of generating grammatically correct, relevant, appropriate, complex, and novel questions. Finally, we analyze the incomplete or unanswerable generated questions to find possible reasons for these issues. PubDate: 2025-04-15
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In the civil law tradition, legal arguments are used to justify the outcomes of judicial decision-making. These arguments are formed relying on a canon of interpretation techniques (e.g. textual or teleological interpretation). We study the identifiability of interpretation techniques as they are employed by the European Court of Human Rights (ECtHR) from a computational law perspective using a unique dataset. We show how Large Language Models (LLMs) can be utilized to classify legal interpretations, and we compare their performance. We evaluate proprietary and opensource models using methods such as few-shot and zero-shot chain-of-thought prompting combined with self-consistency. Our results imply that feature-extraction using LLMs leads to robust outcomes while allowing for greater resource- and timeefficiency compared to human annotation. Furthermore, our results imply that LLMs can play a larger role in the extraction of more complex features that are of particular relevance from a legal perspective. PubDate: 2025-04-15
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: An interdisciplinary approach to legal education is a process of educating law that encompasses courses from other disciplines. An interdisciplinary course is essential for law students to improve complete outlook on legal issues. The interdisciplinary approach creates effective legal findings. The study objective is to explore the role of interdisciplinary approaches to revolutionizing legal education for the 21st century, focused on the benefits of legal education to enhance students’ problem-solving skills and innovative thinking. Both quantitative and qualitative research methods were used in this study. A total number of 800 structured questionnaires were distributed to individuals. The data collection lasted nearly 27 days. 785 valid responses were received from the participants. The data were analyzed using SPSS software. The valid responses are used to recognize the role of interdisciplinary approaches in legal education. The research found that interdisciplinary approaches in legal education develop students’ problem-solving skills and innovative thinking. The interdisciplinary approach helps to improve students’ ability to solve complex problems comprehensively. Legal education increases graduates’ ability to adapt to the changing demands of the 21st-century legal profession. This study enhances law students’ problem-solving skills through interdisciplinary approaches in legal education. The study highlights innovative thinking, understanding of complex legal issues, graduates’ adaptability, and greater competence. This study’s novelty lies in scrutinizing the advantages of interdisciplinary approaches to revolutionizing legal education for the 21st century. PubDate: 2025-04-07
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Norms are essential in our society: they dictate how individuals should behave and interact within a community. They can be written down in laws or other written sources. Interpretations often differ; this is where formalisations offer a solution. They express an interpretation of a source of norms in a transparent manner. However, creating these interpretations is labour intensive. Natural language processing techniques can support this process. Previous work showed the potential of transformer-based models for Dutch law texts. In this paper, we (1) introduce a dataset of 2335 English sentences annotated with legal semantic roles conform the Flint framework; (2) fine-tune a collection of language models on this dataset, and (3) query two non-fine-tuned generative large language models (LLMs). This allows us to compare performance of fine-tuned domain-specific, task-specific, and general language models with non-fine-tuned generative LLMs. The results show that models fine-tuned on our dataset have the best performance (accuracy around 0.88). Furthermore, domain-specific models perform better than general models, indicating that domain knowledge is of added value for this task. Finally, different methods of querying LLMs perform unsatisfactorily, with maximum accuracy scores around 0.6. This indicates that for specific tasks, such as this adaptation of semantic role labelling, the process of annotating data and fine-tuning a smaller language model is preferred over querying a generative LLM, especially when domain-specific models are available. PubDate: 2025-03-24
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Legal document analysis presents significant challenges due to its complexity and domain-specific nature. This study introduces an innovative approach for classifying Indian court judgments into legal domains using diverse machine learning and deep learning techniques. The method incorporates feature engineering and deep learning algorithms for extracting meaningful features, supported by a wide range of classifiers, including voting classifiers, gradient boosting, and random forest. Embeddings are generated using models such as InLegalBERT, InCaseLawBERT, CustomInLawBERT, Mamba, T5, RoBERTa, CodeT5, SBERT, DistilBERT, XLM, XLM Large, LegalBERT, GPT2, ALBERT, Electra, DeBERTa, TFDeBERTa, FlanT5, FlanT-Large, BART, BigBird Pegasus, LongFormer, and LUKE. To address class imbalance, the SMOTE technique is employed, and dimensionality is reduced using PCA and forward feature selection. The T5+SMOTE+feature selection+voting classifier configuration achieves a notable accuracy of 98%, highlighting the effectiveness of the proposed approach. These advancements have significant implications for applications such as document retrieval, legal discovery, and case law analysis, enhancing the accuracy and efficiency of legal document classification. PubDate: 2025-03-24
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Identification of rhetorical roles like facts, arguments, and final judgments is central to understanding a legal case document and can lend power to other downstream tasks like legal case summarization and judgment prediction. However, there are several challenges to this task. Legal documents are often unstructured and contain a specialized vocabulary, making it hard for conventional transformer models to understand them. Additionally, these documents run into several pages, which makes it difficult for neural models to capture the entire context at once. Lastly, there is a dearth of annotated legal documents to train deep learning models. Previous state-of-the-art approaches for this task have focused on using neural models like BiLSTM-CRF or have explored different embedding techniques to achieve decent results. While such techniques have shown that better embedding can result in improved model performance, not many models have focused on utilizing attention for learning better embeddings in sentences of a document. Additionally, it has been recently shown that advanced techniques like multi-task learning can help the models learn better representations, thereby improving performance. In this paper, we combine these two aspects by proposing a novel family of multi-task learning-based models for rhetorical role labeling, named MARRO, that uses transformer-inspired multi-headed attention. Using label shift as an auxiliary task, we show that models from the MARRO family achieve state-of-the-art results on two labeled datasets for rhetorical role labeling, from the Indian and UK Supreme Courts. PubDate: 2025-03-18
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The legal industry is characterized by the presence of dense and complex documents, which necessitate automatic processing methods to manage and analyse large volumes of data. Traditional methods for extracting legal information depend heavily on substantial quantities of annotated data during the training phase. However, a question arises on how to extract information effectively in contexts that do not favour the utilization of annotated data. This study investigates the application of Large Language Models (LLMs) as a transformative solution for the extraction of legal terms, presenting a novel approach to overcome the constraints associated with the need for extensive annotated datasets. Our research delved into methods such as prompt-engineering and fine-tuning to enhance their performance. We evaluated and compared, to a rule-based and BERT systems, the performance of four LLMs: GPT-4, Miqu-1-70b, Mixtral-8x7b, and Mistral-7b, within the scope of limited annotated data availability. We implemented and assessed our methodologies using Luxembourg’s traffic regulations as a case study. Our findings underscore the capacity of LLMs to successfully deal with legal terms extraction, emphasizing the benefits of one-shot and zero-shot learning capabilities in reducing reliance on annotated data by reaching 0.690 F1 Score. Moreover, our study sheds light on the optimal practices for employing LLMs in the processing of legal information, offering insights into the challenges and limitations, including issues related to terms boundary extraction. PubDate: 2025-03-14