Attorney Search

Attorney Search: Improving Access to Legal Aid with NLP and Fair Matching

Hi, I’m Taro Iyadomi and this is Attorney Search, a project I developed for the 2023 UCLA DataFest.

I. A Valuable Platform with Structural Gaps

The American Bar Association’s Free Legal Answers platform connects low-income individuals with volunteer attorneys who provide free legal advice online. It plays an important role in expanding access to justice. However, despite its value, the platform faces two major issues that hinder its effectiveness:

Manual Categorization by Clients:
Clients must select a legal category when submitting their question. Many struggle due to language barriers, unfamiliarity with legal terms, or misinterpretation—leading to inaccurate or inconsistent categorization.
Fragmented and Inconsistent Subcategories:
After submission, questions are manually reclassified into specific subcategories. Each U.S. state maintains its own labeling system, leading to over 340 subcategories—many of which are duplicative or semantically overlapping. This causes inefficiency and errors.
No Intelligent Routing to Attorneys:
Questions are answered by the first available lawyer, regardless of their expertise. This lack of specialization reduces the quality of legal responses.

II. The Solution: Attorney Search

Attorney Search is a proof-of-concept tool designed to improve the classification and routing of legal questions. It combines modern natural language processing (NLP) with demographic-aware scoring to achieve three main goals:

Automatically classify legal questions using a fine-tuned large language model.
Simplify and consolidate subcategories through unsupervised clustering.
Match each question to the most relevant attorneys based on legal topic and client context.

By removing categorization burden from clients and applying intelligent attorney matching, Attorney Search aims to increase both the efficiency and equity of the platform.

III. System Overview

The Attorney Search pipeline includes the following components:

i) Preprocessing

Standardizes text using lemmatization and removal of stopwords and punctuation.

ii) Classification

Cleaned questions are passed through a fine-tuned DistilBERT model.
Predicts both category and subcategory for the legal issue.

iii) Subcategory Simplification

Generated BERT-based embeddings for all labels.
Applied K-Means clustering to reduce 340+ subcategories into 8 meaningful clusters.

iv) Matching and Scoring

Combines predicted category/subcategory with user demographic data (ethnicity, gender, imprisonment status).
A scoring algorithm ranks attorneys by suitability.
Attorneys with higher scores are prioritized for assignment.

IV. Key Findings and Observations

Initial hypotheses examined geographic and temporal variation, but visualizations (e.g., choropleth maps) showed no meaningful trends by state or season.

Instead, client demographics emerged as valuable for personalization. Though driven by intuition, they provided a practical, interpretable foundation for matching questions to appropriate attorneys.

V. Limitations and Future Directions

While promising, the prototype has room for improvement:

Learned Scoring Models:
Replace hand-crafted weights with a trainable neural network using labeled feedback.
Feedback Loop:
Add a thumbs-up/thumbs-down rating system to gather data on match quality.
Expanded Features:
Include attorney specialties, languages spoken, and past performance for better personalization and fairness.

VI. Demo and Conclusion

A working demo of Attorney Search allows users to:

Input a legal question and demographic info
Receive:
- A predicted category and subcategory
- A list of three recommended attorneys ranked by the scoring algorithm

Attorney Search demonstrates how machine learning and language models can improve access to legal aid. By automating classification and optimizing attorney matching, tools like this can help legal platforms deliver faster, more accurate, and more equitable outcomes.

Posted on:: March 18, 2024

Length:: 3 minute read, 542 words

Tags:: NLP BERT Clustering

See Also:: Learning Equality: Curriculum Recommendations