Data & ML Methodology
We move beyond simple aggregation. Our platform uses a multi-stage pipeline of machine learning models to analyze risk, forecast trends, and translate legal complexities.
Automated Data Collection
Our system aggregates data from verified human rights sources, primarily the OVD-Info API and Memorial Human Rights Center. Automated scripts continuously synchronize our database to ensure real-time accuracy regarding arrests, sentencing, and prisoner locations.
Entity Extraction & Structuring
We employ Natural Language Processing (NLP) to parse unstructured case summaries. This involves identifying legal actors (judges, investigators), categorizing criminal articles (e.g., '207.3 Fake News'), and extracting surveillance technology vendors involved in the arrest.
Geocoding & Neural Translation
Location data is standardized and converted to coordinates via the OpenStreetMap Nominatim API for geospatial analysis. Concurrently, case details are translated from Russian to English using Google Cloud Translation services to ensure international accessibility.
Machine Learning Risk Assessment
We utilize XGBoost classifiers trained on historical data to assign risk probabilities to new cases. The model evaluates factors such as criminal articles, age, gender, and location to predict the likelihood of 'Urgency' (Immediate Action Required) and the 'Risk of Torture' while in custody.
Predictive Forecasting & Network Topology
Our Python microservice runs Prophet time-series models to forecast arrest trends up to 90 days into the future. Additionally, we build network graphs linking cases by similarity (shared charges, location, and tactics) to detect coordinated repression campaigns and communities.
Generative Legal Tools
Using LLMs (Large Language Models), we provide generative tools for legal professionals. This includes an Affidavit Generator that synthesizes prisoner data with country condition reports to draft support documents for asylum cases, and an automated system for generating 'Statement of Complicity' dossiers.
A Note on Predictive Models
Our risk scores and forecasts are probabilistic tools derived from historical data. They are designed to aid researchers and legal professionals in prioritization, not to replace human judgment. A "High Risk" score indicates a statistical resemblance to past cases involving torture or harsh sentencing, but specific outcomes may vary.
See the Data in Action
Explore our predictive dashboards