How Actowiz built a two-layer pipeline scraping Martindale.com and enriching U.S. law firm websites across 5 practice areas, with 99%+ data accuracy.
A U.S.-based legal-technology firm operating an attorney and law-firm intelligence platform — needing structured law firm and attorney data across five practice areas, nationwide.
The client is a U.S.-based legal-technology firm operating an attorney and law-firm intelligence platform. Their platform serves law firms, legal marketing agencies, and legal data aggregators who need structured, accurate, and regularly refreshed data on attorneys and law firms across the United States.
To power their platform, the client required a high-volume, reliable source of structured law firm and attorney data — sourced from Martindale.com, one of the most authoritative legal directories in the country, and further enriched from individual firm websites to capture richer, attorney-level information that directories alone cannot provide.
Martindale.com aggregates profile pages for hundreds of thousands of law firms and attorneys across the United States and Canada. For the client's use case, it serves as the primary discovery layer — a structured directory with firm-level metadata that would be impossible to compile manually at scale.
However, Martindale has structural limitations that made it insufficient as a standalone source:
These gaps defined the two-layer approach: Martindale as the discovery and foundation layer, and individual firm websites as the enrichment layer.
The project covered five targeted practice areas across U.S. law firms, with strict geography filtering applied throughout:
Actowiz extracted a comprehensive set of structured fields from each qualifying law firm profile. Every record was filtered to U.S.-only firms before delivery.
| Field | Description |
|---|---|
| title | Name of the law firm as listed on Martindale |
| url | Martindale profile page link for the law firm |
| website_url | Official website of the law firm (if available) |
| total_employees | Approximate firm size or headcount (if listed) |
| description | Short summary or overview of the firm from Martindale |
| full_address | Complete address including street, city, state, and zip |
| street_address | Street-level address only |
| city | City of the firm's office |
| state | State of the firm's office |
| zip | Zip/postal code |
| country | Country — filtered to United States only |
| other_address | Secondary or alternate office address (if listed) |
| contact | Phone number or email (if publicly available) |
| about | Additional firm background or descriptive content |
| area_of_practice | Practice areas listed for the firm |
| people | Attorney names or count associated with the firm |
| number_of_attorneys | Total practicing attorneys at the firm |
| partner | Count of partners listed |
| member | Count of member-level attorneys |
| establish | Year the firm was established |
| director | Count of attorneys with the title Director |
| managing_partner | Count of Managing Partners |
| associate | Count of Associates |
| founder | Count of Founders listed |
| wrongful_death_lawyer | Count of attorneys specializing in wrongful death |
| principal | Count of Principals |
| attorney | General attorney count (where no sub-role is listed) |
| associate_attorney | Count titled Associate Attorney specifically |
| senior_associate | Count of Senior Associates |
| shareholder | Count of Shareholders |
| of_counsel | Count of Of Counsel attorneys |
| founding_partner | Count of Founding Partners |
Every Martindale profile with an available website_url was passed into the enrichment layer. Actowiz built and deployed individual site crawlers to extract deeper attorney and firm intelligence directly from each firm's own website.
Law firm websites are highly heterogeneous — built across WordPress, Squarespace, custom builds, and legacy HTML, with attorney information structured differently on every site. Actowiz's AI-powered extraction handled this variation at scale across all five practice areas and all 50 U.S. states.
Attorney- and firm-level data enriched from firm websites:
| Field | Notes |
|---|---|
| Attorney / Lawyer count | Often listed on website; inferred from About Us, team pages, or firm descriptions |
| Employee count (total) | Includes attorneys; inferred from website where not explicitly stated. Range provided where substantiated (e.g., 10–20, or 100+ if site says “hundreds of employees”) |
| Primary state | Based on primary office address or firm description |
| Primary office location | Full address of main office |
| All office locations | All addresses listed on the website, comma-delimited |
| Total offices | Count of distinct office locations |
| Practice Areas | All practice areas listed |
| Primary Practice Area | Primary practice area, if known |
| Practice Alignment | Defense, Plaintiff, or Both |
| Practice Alignment Priority | Whether predominantly a plaintiff or defense firm |
| Is Law Firm? | Confirms whether the domain and record is indeed a law firm (Yes/No) |
To deliver both layers at scale across all five practice areas and all 50 U.S. states, Actowiz deployed a dedicated scraping and enrichment stack:
The five practice areas are described in non-standard language across thousands of firm websites. Firms write for clients, not data systems, so simple keyword matching fails. Actowiz deployed an LLM-assisted classification engine to map real-world language to the client's internal taxonomy.
| Firm Website Language | Classified As |
|---|---|
| “Hurt at work? We fight for your benefits.” | Workers' Compensation |
| “Occupational disease and WC appeals” | Workers' Compensation |
| “SSDI and SSI claims — denied benefits appeals” | Social Security Disability |
| “Getting disabled workers the income they deserve” | Social Security Disability |
| “Surgical error, misdiagnosis, and birth injury litigation” | Medical Malpractice |
| “Fighting insurance companies after a hospital mistake” | Medical Malpractice |
| “Wrongful termination, discrimination, and wage theft” | Labor & Employment |
| “Slip and fall, auto accidents, and catastrophic injury” | Personal Injury |
| Field | Value |
|---|---|
| title | Harrison & Bloom LLP |
| url | martindale.com/law-firms/harrison-bloom |
| website_url | harrisonbloomlaw.com |
| state | Texas |
| city | Houston |
| area_of_practice | Workers' Compensation, Personal Injury |
| number_of_attorneys | 12 |
| partner | 3 |
| associate | 7 |
| of_counsel | 2 |
| establish | 2004 |
| contact | (713) 555-0144 |
| Field | Value |
|---|---|
| Attorney Name | Sandra M. Reyes |
| Firm | Harrison & Bloom LLP |
| Title | Senior Partner |
| Practice Focus | Workers' Compensation, Occupational Disease |
| Bar Admissions | Texas (2001), Louisiana (2005) |
| Law School | University of Houston Law Center |
| Languages | English, Spanish |
| Awards | Texas Super Lawyers 2019–2024 |
| Case Types | On-the-job injuries, toxic exposure, WC appeals |
| Direct Phone | (713) 555-0145 |
The client received a unified, two-source dataset — structured firm-level records from Martindale enriched with attorney-level depth from firm websites — covering all five practice areas across the entire United States, spanning 15,000+ law firms and 85,000+ attorney records, delivered in 10 weeks.
Their platform could now offer precise attorney discovery by practice focus, state, role, bar admission, experience level, and case-type specialization — granularity no single legal directory provides. Records sourced directly from firm websites are more current and complete than directory-only profiles, giving the client a meaningful data-quality advantage over competitors relying on self-reported directory listings alone.
| Metric | Value |
|---|---|
| Primary Data Source | Martindale.com |
| Secondary Data Source | Individual U.S. law firm websites |
| Geography | United States only — all 50 states |
| Practice Areas Covered | Personal Injury; Labor & Employment; Social Security Disability; Workers' Compensation; Medical Malpractice |
| Firm-Level Fields Extracted | 30+ |
| Attorney-Level Enrichment Fields | 15+ |
| Delivery Formats | CSV, Excel |
| Geography Filter | U.S. state validation at extraction layer |
| QA Method | Automated + human-in-the-loop |
"What impressed us most was Actowiz's ability to handle complex legal websites at scale. Their two-layer data pipeline provided comprehensive law firm and attorney intelligence across all 50 U.S. states, enabling us to offer richer search capabilities and more accurate legal profiles than ever before."
— Chief Product Officer, Legal Technology Company
Actowiz Solutions designs custom, large-scale scraping and enrichment pipelines with 99%+ accuracy. Visit actowizsolutions.com to discuss your data requirement.
Our web scraping expertise is relied on by 4,000+ global enterprises including Zomato, Tata Consumer, Subway, and Expedia — helping them turn web data into growth.
Watch how businesses like yours are using Actowiz data to drive growth.
From Zomato to Expedia — see why global leaders trust us with their data.
Backed by automation, data volume, and enterprise-grade scale — we help businesses from startups to Fortune 500s extract competitive insights across the USA, UK, UAE, and beyond.
We partner with agencies, system integrators, and technology platforms to deliver end-to-end solutions across the retail and digital shelf ecosystem.
Leverage MisterLlantas Tyre Data Scraping to track tyre prices, inventory, brands, specifications, and automotive market trends.
Unlock property market insights with Scraping imot.bg Real Estate Data to track listings, prices, trends, and investment opportunities.
Nykaa Fashion product data extraction enables businesses to track products, prices, inventory, and trends for smarter retail decisions.
Whether you're a startup or a Fortune 500 — we have the right plan for your data needs.