About KARGA Markets
A technical deep-dive into AI-powered financial knowledge graphs
πThe Problem
Traditional financial data platforms store information in isolated silos. Stock prices live in one database, government contracts in another, SEC filings in a third, and prediction markets somewhere else entirely.
When you want to answer questions like "Which S&P 500 companies with significant government contracts are mentioned in prediction markets?" - you'd need to manually query multiple systems, export data, and perform complex joins in spreadsheets.
There had to be a better way.
β‘The Solution: KARGA
KARGA Markets combines three powerful technologies:
1. Knowledge Graphs
Data stored as interconnected nodes and relationships using ArangoDB, enabling complex multi-hop queries across disparate data sources in milliseconds.
2. Retrieval Augmented Generation
AI (GPT-4) generates precise database queries from natural language, then analyzes results with full context - no hallucination, only real data.
3. Semantic Search
Vector embeddings enable concept-based search - find "cybersecurity contracts" even when documents use terms like "network security" or "threat detection."
π§System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE β
β Next.js 14 + React + Framer Motion + Tailwind CSS β
β β’ Natural Language Query Input β
β β’ Interactive Graph Visualization (ReactFlow) β
β β’ Real-time Market Cards β
β β’ Data Tables with Filtering/Sorting β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ
β HTTPS / REST API
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
β FASTAPI BACKEND β
β Python 3.13 + FastAPI + Pydantic β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Query Pipeline (Parallel Execution) β β
β β ββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β GPT-4 β β Perplexity β β ArangoDB β β β
β β β Intent β β Web Search β β Graph Query β β β
β β β Detection β β (Current β β (Historical β β β
β β β β β Events) β β Data) β β β
β β ββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β β
β β β β β β β
β β βββββββββββββββββββ΄ββββββββββββββββββββ β β
β β β β β
β β ββββββββΌβββββββββββ β β
β β β GPT-4 Synthesis β β β
β β β Combines Resultsβ β β
β β ββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Security: Rate Limiting β’ Input Validation β’ HSTS Headers β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ
β AQL Queries
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
β ARANGODB CLOUD β
β Multi-Model Database (Document + Graph + Search) β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Document β β Graph β β Vector β β
β β Collections β β Edges β β Embeddings β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
β β’ 500+ companies (S&P 500) β
β β’ 2M+ daily market data points (OHLCV + 40 indicators) β
β β’ 100K+ government contract awards (with embeddings) β
β β’ 50K+ SEC filings (10-K, 10-Q, 8-K) β
β β’ 20K+ prediction markets (Polymarket + Kalshi) β
β β’ FRED economic indicators β
β β’ CFTC commodity positions β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββπData Sources & Integration
π Stock Market Data
- β’Source: Yahoo Finance API
- β’Coverage: All S&P 500 companies
- β’Data: OHLCV, volume, market cap, P/E ratios, technical indicators (SMA, EMA, MACD, RSI, Bollinger Bands)
- β’Update Frequency: Daily via Airflow DAG
π Government Contracts
- β’Source: USASpending.gov API
- β’Coverage: Federal contract awards to public companies
- β’Data: Award amounts, agencies, descriptions, dates
- β’Special Feature: Vector embeddings for semantic search (find "AI contracts" without exact keyword match)
π Prediction Markets
- β’Sources: Polymarket API, Kalshi API
- β’Coverage: Politics, economics, sports, entertainment
- β’Data: Probabilities, volumes, liquidity, traders
- β’Connection: Graph edges link markets to mentioned companies (e.g., "Will Tesla reach $300?" β Tesla stock)
π SEC Filings
- β’Source: SEC EDGAR API
- β’Types: 10-K (annual), 10-Q (quarterly), 8-K (events)
- β’Processing: Parsed into sections and sentences
- β’Sentiment: FinBERT scores for each sentence (-1 to +1)
π Economic Indicators
- β’Source: Federal Reserve Economic Data (FRED)
- β’Data: S&P 500 index, Fed funds rate, unemployment, GDP, yield curves
- β’Coverage: Historical time series data
π Commodity Positions
- β’Source: CFTC Commitments of Traders Report
- β’Data: Long/short positions by trader type (commercial, non-commercial, retail)
- β’Commodities: Oil, gold, wheat, corn, natural gas, etc.
πKnowledge Graph Structure
Data isn't just storedβit's connected. Here's how relationships enable powerful queries:
Company β Market Data
HAS_MARKETDATA edges connect companies to their daily stock prices, enabling queries like "Show me tech companies with SMA_50 > SMA_200 (golden cross)"
Company β Government Awards
HAS_AWARD edges link companies to contracts, enabling semantic searches: "Defense companies with cybersecurity contracts over $10M"
Market β Company
market_mentions_company edges connect prediction markets to mentioned tickers: "Tesla reaches $300" β TSLA
Company β SEC Filings β Sentences
HAS_FILING β has_section β has_sentenceMulti-hop traversal for sentiment analysis: "Show negative FinBERT sentences from recent Apple 10-Ks"
Company β Commodity Positions
HAS_COMMODITY_POSITION links companies to CFTC data for commodity exposure analysis
π‘ Example Multi-Hop Query
"Find energy companies with government contracts mentioning 'renewable' that are mentioned in prediction markets with volume > $50k"
β Traverses Company β Awards (semantic search) β Markets (graph join) in milliseconds
π€AI Query Pipeline
When you ask a question, here's what happens behind the scenes:
Step 1: Intent Detection
GPT-4 classifies your query: Is it about a specific ticker (AAPL, MSFT) or a concept (AI, cybersecurity)? This determines whether to use exact matching or semantic search.
Intent: concept_query
Step 2: Query Planning
GPT-4 receives the full database schema (collections, fields, relationships) and generates optimized AQL (ArangoDB Query Language) with proper joins and filters.
FOR award IN Award
FILTER COSINE_SIMILARITY(
award.description_embedding,
@query_vector
) >= 0.75
FOR company IN Company
FILTER company.ticker == award.ticker
RETURN {company, award}Step 3: Parallel Execution
Two queries run simultaneously:
- β’ Database Query: AQL executes against ArangoDB (historical data)
- β’ Web Search: Perplexity searches for current events (real-time context)
Step 4: Synthesis & Analysis
GPT-4 combines database results with web context, analyzes patterns, and generates:
- β’ Markdown Tables: Formatted results with key metrics
- β’ Insights: Trends, correlations, anomalies
- β’ Follow-up Questions: Suggested deeper dives
π§Technology Stack
Frontend
- Framework: Next.js 14 (App Router)
- UI: React 18, TypeScript
- Styling: Tailwind CSS
- Animations: Framer Motion
- Graph Viz: ReactFlow
- Hosting: Vercel
Backend
- Framework: FastAPI (Python 3.13)
- Validation: Pydantic
- Security: SlowAPI rate limiting
- LLM: OpenAI GPT-4
- Web Search: Perplexity AI
- Hosting: Railway
Database
- Platform: ArangoDB Cloud
- Type: Multi-model (Document + Graph)
- Query Language: AQL
- Embeddings: OpenAI text-embedding-3-small
- Size: ~5GB (2M+ documents)
- Location: Germany (GDPR compliant)
Data Pipeline
- Orchestration: Apache Airflow
- Processing: Python, Pandas, NumPy
- Sentiment: FinBERT
- Schedule: Daily updates at 2 AM UTC
- Monitoring: Airflow UI + logs
β‘Performance & Scale
Performance Optimizations
- βPersistent indexes on ticker, date, volume fields
- βSkip-list indexes for range queries
- βEdge collections for O(1) relationship lookups
- βQuery result caching (5-minute TTL)
- βParallel DB + web search execution
- βStreaming results with batch_size=1000
πFuture Enhancements
β‘ Real-time Data
WebSocket connections for live market data updates, streaming prediction market probability changes as they happen.
π Portfolio Tracking
User accounts to track favorite companies, save queries, and set up alerts for specific market conditions.
π€ Advanced ML Models
Time-series forecasting with LSTM, anomaly detection for unusual trading patterns, correlation discovery between data sources.
π More Data Sources
Twitter sentiment, Reddit discussions, earnings call transcripts, patent filings, and international market data.
π Custom Dashboards
Drag-and-drop dashboard builder with custom charts, metrics, and KPIs tailored to individual research needs.
π API Access
Public API with authentication for programmatic access to KARGA capabilities, enabling integrations with trading platforms.
πOpen Source & Contributions
KARGA Markets is built with transparency in mind. While the core application is proprietary, we're exploring open-sourcing components of the query planning system and graph schema to help others build similar systems.
Interested in collaborating or have ideas for improvement? Reach out at karga.analytics@gmail.com