lemkin-resources

Document Classifier Model

Overview

Deep learning model for classifying legal documents into categories relevant to international criminal justice proceedings.

Model Architecture

Base Model: BERT-based transformer
Fine-tuned on: 50,000+ legal documents
Output Classes: 15 document categories
Model Size: 450MB
Parameters: 110M

Performance Metrics

Accuracy: 94.2%
F1 Score: 0.93
Precision: 0.92
Recall: 0.94
Inference Time: ~120ms per document

Installation

pip install torch transformers

Usage

from document_classifier import DocumentClassifier

# Initialize model
classifier = DocumentClassifier()

# Load from pretrained
classifier.load_pretrained('path/to/model.pth')

# Classify document
result = classifier.classify('path/to/document.pdf')
print(f"Category: {result['category']}")
print(f"Confidence: {result['confidence']}")

API Reference

`DocumentClassifier.classify(document_path, return_probs=False)`

Classifies a document into one of the predefined categories.

Parameters:

document_path (str): Path to the document file
return_probs (bool): Return probability distribution over all classes

Returns:

Dictionary with ‘category’ and ‘confidence’ keys
If return_probs=True, includes ‘probabilities’ key

Training Data

The model was trained on a diverse dataset including:

International Criminal Court documents
Human Rights tribunal proceedings
National court documents (anonymized)
NGO investigation reports
News articles about legal proceedings

Limitations

Optimized for English language documents
May require retraining for specific legal jurisdictions
Performance may vary on handwritten or heavily redacted documents

Model Files

model.pth - PyTorch model weights
config.json - Model configuration
tokenizer/ - Tokenizer files
labels.json - Category labels mapping

License

MIT License - See LICENSE file for details

Citation

@model{document_classifier_2024,
  title={Document Classifier for Legal Proceedings},
  author={Lemkin AI},
  year={2024},
  version={1.0}
}

This site is open source. Improve this page.