Overview¶
Progrid AI Lead Research (progrid_ai_research) is a CRM module that automates the process of
discovering, qualifying, and importing business leads from the web. It is designed for sales teams,
marketing departments, and business development professionals who need a scalable way to identify
new prospects without manual research.
Purpose¶
Traditional lead research involves visiting dozens of websites, copying contact details into spreadsheets, and manually assessing whether a business is active. This module automates that entire workflow by combining web search APIs, content extraction, and large language model (LLM) analysis into a single pipeline.
Each research job goes through six phases automatically:
Search – Query web search providers to find relevant business URLs
Fetch – Download and extract content from discovered web pages
Normalize – Use AI to extract structured business data (name, emails, phones, services)
Score – Evaluate each business’s activity level with a confidence percentage
Store – Save results with supporting evidence and metadata
Deliver – Create CRM leads, enrich partner records, or export to CSV
Target users¶
This module is built for two primary audiences:
Sales and marketing teams¶
Sales representatives and marketing professionals use AI Lead Research to discover new prospects in specific industries, regions, or niches. Typical use cases include:
Finding local service providers who may need a new website or software solution
Identifying competitors in a given market
Discovering businesses with outdated web presences that could benefit from modernization
System administrators¶
Administrators configure search providers, manage API keys, set rate limits, and monitor cache utilization. They also control which users have access to create research jobs versus only viewing results.
Provider ecosystem¶
The module integrates with three external service providers. All communication happens via secure HTTPS API calls.
Search providers¶
Two web search providers are supported. You can use either one or both simultaneously.
Provider |
Description |
Free tier |
Configuration |
|---|---|---|---|
Brave Search |
Privacy-focused web search API with strong coverage of business directories and local results. |
2,000-5,000 queries/month |
|
Tavily |
AI-optimized search API designed for LLM applications. Returns pre-processed content snippets. |
1,000 credits/month |
LLM provider¶
Provider |
Description |
Free tier |
Configuration |
|---|---|---|---|
Groq |
High-speed LLM inference using Llama 3.1 70B. Handles both data extraction (normalize phase) and activity scoring (score phase). |
Generous free tier with rate limits |
Pipeline concept¶
Every research job follows the same six-phase pipeline. Understanding these phases helps you interpret job status and troubleshoot issues.
Phase 1: Search¶
The system generates up to five query variations from your input parameters (target description, location, industry). These queries are sent to the configured search provider(s) – Brave, Tavily, or both in mixed mode. The raw search results (URLs and snippets) are collected for the next phase.
Phase 2: Fetch¶
Each discovered URL is fetched and its content extracted. The module uses the Trafilatura library
for intelligent content extraction, with a regex-based fallback for pages that resist standard
parsing. A 7-day content cache (model Progrid.fetch.cache) prevents redundant downloads of
the same URL, reducing API usage and speeding up subsequent jobs targeting similar businesses.
Phase 3: Normalize¶
The extracted page content is sent to the Groq LLM with a structured prompt. The AI extracts:
Business name – The official name of the company
Email addresses – All contact emails found on the page
Phone numbers – Phone and fax numbers
Services offered – What the business provides
Activity signals – Evidence of recent activity (blog posts, news, updated copyright dates)
Phase 4: Score¶
A second LLM call evaluates the extracted data and assigns:
Confidence score (0-100%) – How confident the system is about the data accuracy
Activity status – One of three categories:
active– Clear evidence of recent business activityunclear– Insufficient information to determine statusinactive– Signs the business may be closed or dormant
Phase 5: Store¶
Results are saved as Progrid.research.result records linked to the parent job. Each result
includes the extracted business data, confidence score, activity status, and the raw evidence text
that the LLM used for its evaluation.
Phase 6: Deliver¶
Based on the job’s configured deliverables, the system can:
Create CRM leads – Results with confidence above 50% are automatically converted into CRM leads (
crm.leadrecords) with pre-filled contact informationEnrich existing partners – If a matching partner is found by email or website domain, the existing record is updated rather than creating a duplicate
Export to CSV – Generate a downloadable CSV file with all results for external use
Note
Deduplication during the deliver phase uses email address and website domain matching to prevent creating duplicate leads or partner records in your CRM.
Module information¶
Technical name |
|
Version |
18.0.1.1.0 |
Category |
CRM |
Dependencies |
|