Overview¶

Progrid AI Lead Research (progrid_ai_research) is a CRM module that automates the process of discovering, qualifying, and importing business leads from the web. It is designed for sales teams, marketing departments, and business development professionals who need a scalable way to identify new prospects without manual research.

Purpose¶

Traditional lead research involves visiting dozens of websites, copying contact details into spreadsheets, and manually assessing whether a business is active. This module automates that entire workflow by combining web search APIs, content extraction, and large language model (LLM) analysis into a single pipeline.

Each research job goes through six phases automatically:

Search – Query web search providers to find relevant business URLs
Fetch – Download and extract content from discovered web pages
Normalize – Use AI to extract structured business data (name, emails, phones, services)
Score – Evaluate each business’s activity level with a confidence percentage
Store – Save results with supporting evidence and metadata
Deliver – Create CRM leads, enrich partner records, or export to CSV

Target users¶

This module is built for two primary audiences:

Sales and marketing teams¶

Sales representatives and marketing professionals use AI Lead Research to discover new prospects in specific industries, regions, or niches. Typical use cases include:

Finding local service providers who may need a new website or software solution
Identifying competitors in a given market
Discovering businesses with outdated web presences that could benefit from modernization

System administrators¶

Administrators configure search providers, manage API keys, set rate limits, and monitor cache utilization. They also control which users have access to create research jobs versus only viewing results.

Provider ecosystem¶

The module integrates with three external service providers. All communication happens via secure HTTPS API calls.

Search providers¶

Two web search providers are supported. You can use either one or both simultaneously.

Provider	Description	Free tier	Configuration
Brave Search	Privacy-focused web search API with strong coverage of business directories and local results.	2,000-5,000 queries/month	CRM ‣ AI Research ‣ Configuration ‣ Settings
Tavily	AI-optimized search API designed for LLM applications. Returns pre-processed content snippets.	1,000 credits/month	CRM ‣ AI Research ‣ Configuration ‣ Settings

LLM provider¶

Provider	Description	Free tier	Configuration
Groq	High-speed LLM inference using Llama 3.1 70B. Handles both data extraction (normalize phase) and activity scoring (score phase).	Generous free tier with rate limits	CRM ‣ AI Research ‣ Configuration ‣ Settings

Pipeline concept¶

Every research job follows the same six-phase pipeline. Understanding these phases helps you interpret job status and troubleshoot issues.

Phase 1: Search¶

The system generates up to five query variations from your input parameters (target description, location, industry). These queries are sent to the configured search provider(s) – Brave, Tavily, or both in mixed mode. The raw search results (URLs and snippets) are collected for the next phase.

Phase 2: Fetch¶

Each discovered URL is fetched and its content extracted. The module uses the Trafilatura library for intelligent content extraction, with a regex-based fallback for pages that resist standard parsing. A 7-day content cache (model Progrid.fetch.cache) prevents redundant downloads of the same URL, reducing API usage and speeding up subsequent jobs targeting similar businesses.

Phase 3: Normalize¶

The extracted page content is sent to the Groq LLM with a structured prompt. The AI extracts:

Business name – The official name of the company
Email addresses – All contact emails found on the page
Phone numbers – Phone and fax numbers
Services offered – What the business provides
Activity signals – Evidence of recent activity (blog posts, news, updated copyright dates)

Phase 4: Score¶

A second LLM call evaluates the extracted data and assigns:

Confidence score (0-100%) – How confident the system is about the data accuracy
Activity status – One of three categories:
- active – Clear evidence of recent business activity
- unclear – Insufficient information to determine status
- inactive – Signs the business may be closed or dormant

Phase 5: Store¶

Results are saved as Progrid.research.result records linked to the parent job. Each result includes the extracted business data, confidence score, activity status, and the raw evidence text that the LLM used for its evaluation.

Phase 6: Deliver¶

Based on the job’s configured deliverables, the system can:

Create CRM leads – Results with confidence above 50% are automatically converted into CRM leads (crm.lead records) with pre-filled contact information
Enrich existing partners – If a matching partner is found by email or website domain, the existing record is updated rather than creating a duplicate
Export to CSV – Generate a downloadable CSV file with all results for external use

Note

Deduplication during the deliver phase uses email address and website domain matching to prevent creating duplicate leads or partner records in your CRM.

Module information¶

Technical name	`progrid_ai_research`
Version	18.0.1.1.0
Category	CRM
Dependencies	`crm`, `mail`, `base`, `base_setup`, `utm`