⚠️

Not Financial Advice: This platform is for informational purposes only. Market data may be delayed. Always conduct your own research. Read full disclaimer

About KARGA Markets

A technical deep-dive into AI-powered financial knowledge graphs

πŸ“ŠThe Problem

Traditional financial data platforms store information in isolated silos. Stock prices live in one database, government contracts in another, SEC filings in a third, and prediction markets somewhere else entirely.

When you want to answer questions like "Which S&P 500 companies with significant government contracts are mentioned in prediction markets?" - you'd need to manually query multiple systems, export data, and perform complex joins in spreadsheets.

There had to be a better way.

⚑The Solution: KARGA

KARGA Markets combines three powerful technologies:

1. Knowledge Graphs

Data stored as interconnected nodes and relationships using ArangoDB, enabling complex multi-hop queries across disparate data sources in milliseconds.

2. Retrieval Augmented Generation

AI (GPT-4) generates precise database queries from natural language, then analyzes results with full context - no hallucination, only real data.

3. Semantic Search

Vector embeddings enable concept-based search - find "cybersecurity contracts" even when documents use terms like "network security" or "threat detection."

πŸ”§System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          USER INTERFACE                              β”‚
β”‚  Next.js 14 + React + Framer Motion + Tailwind CSS                  β”‚
β”‚  β€’ Natural Language Query Input                                      β”‚
β”‚  β€’ Interactive Graph Visualization (ReactFlow)                       β”‚
β”‚  β€’ Real-time Market Cards                                            β”‚
β”‚  β€’ Data Tables with Filtering/Sorting                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚ HTTPS / REST API
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     FASTAPI BACKEND                                  β”‚
β”‚  Python 3.13 + FastAPI + Pydantic                                    β”‚
β”‚                                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚  Query Pipeline (Parallel Execution)                  β”‚           β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚           β”‚
β”‚  β”‚  β”‚ GPT-4      β”‚  β”‚ Perplexity   β”‚  β”‚ ArangoDB     β”‚ β”‚           β”‚
β”‚  β”‚  β”‚ Intent     β”‚  β”‚ Web Search   β”‚  β”‚ Graph Query  β”‚ β”‚           β”‚
β”‚  β”‚  β”‚ Detection  β”‚  β”‚ (Current     β”‚  β”‚ (Historical  β”‚ β”‚           β”‚
β”‚  β”‚  β”‚            β”‚  β”‚  Events)     β”‚  β”‚  Data)       β”‚ β”‚           β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚           β”‚
β”‚  β”‚       β”‚                 β”‚                   β”‚         β”‚           β”‚
β”‚  β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚           β”‚
β”‚  β”‚                         β”‚                              β”‚           β”‚
β”‚  β”‚                  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚           β”‚
β”‚  β”‚                  β”‚  GPT-4 Synthesis β”‚                  β”‚           β”‚
β”‚  β”‚                  β”‚  Combines Resultsβ”‚                  β”‚           β”‚
β”‚  β”‚                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                                                                       β”‚
β”‚  Security: Rate Limiting β€’ Input Validation β€’ HSTS Headers           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚ AQL Queries
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ARANGODB CLOUD                                    β”‚
β”‚  Multi-Model Database (Document + Graph + Search)                   β”‚
β”‚                                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚  β”‚  Document    β”‚  β”‚  Graph       β”‚  β”‚  Vector      β”‚              β”‚
β”‚  β”‚  Collections β”‚  β”‚  Edges       β”‚  β”‚  Embeddings  β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                                                                       β”‚
β”‚  β€’ 500+ companies (S&P 500)                                          β”‚
β”‚  β€’ 2M+ daily market data points (OHLCV + 40 indicators)             β”‚
β”‚  β€’ 100K+ government contract awards (with embeddings)                β”‚
β”‚  β€’ 50K+ SEC filings (10-K, 10-Q, 8-K)                               β”‚
β”‚  β€’ 20K+ prediction markets (Polymarket + Kalshi)                    β”‚
β”‚  β€’ FRED economic indicators                                          β”‚
β”‚  β€’ CFTC commodity positions                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ŠData Sources & Integration

πŸ“Š Stock Market Data

  • β€’Source: Yahoo Finance API
  • β€’Coverage: All S&P 500 companies
  • β€’Data: OHLCV, volume, market cap, P/E ratios, technical indicators (SMA, EMA, MACD, RSI, Bollinger Bands)
  • β€’Update Frequency: Daily via Airflow DAG

πŸ“‹ Government Contracts

  • β€’Source: USASpending.gov API
  • β€’Coverage: Federal contract awards to public companies
  • β€’Data: Award amounts, agencies, descriptions, dates
  • β€’Special Feature: Vector embeddings for semantic search (find "AI contracts" without exact keyword match)

πŸ“ˆ Prediction Markets

  • β€’Sources: Polymarket API, Kalshi API
  • β€’Coverage: Politics, economics, sports, entertainment
  • β€’Data: Probabilities, volumes, liquidity, traders
  • β€’Connection: Graph edges link markets to mentioned companies (e.g., "Will Tesla reach $300?" β†’ Tesla stock)

πŸ“‹ SEC Filings

  • β€’Source: SEC EDGAR API
  • β€’Types: 10-K (annual), 10-Q (quarterly), 8-K (events)
  • β€’Processing: Parsed into sections and sentences
  • β€’Sentiment: FinBERT scores for each sentence (-1 to +1)

πŸ“Š Economic Indicators

  • β€’Source: Federal Reserve Economic Data (FRED)
  • β€’Data: S&P 500 index, Fed funds rate, unemployment, GDP, yield curves
  • β€’Coverage: Historical time series data

πŸ“Š Commodity Positions

  • β€’Source: CFTC Commitments of Traders Report
  • β€’Data: Long/short positions by trader type (commercial, non-commercial, retail)
  • β€’Commodities: Oil, gold, wheat, corn, natural gas, etc.

πŸ”—Knowledge Graph Structure

Data isn't just storedβ€”it's connected. Here's how relationships enable powerful queries:

Company β†’ Market Data

HAS_MARKETDATA edges connect companies to their daily stock prices, enabling queries like "Show me tech companies with SMA_50 > SMA_200 (golden cross)"

Company β†’ Government Awards

HAS_AWARD edges link companies to contracts, enabling semantic searches: "Defense companies with cybersecurity contracts over $10M"

Market β†’ Company

market_mentions_company edges connect prediction markets to mentioned tickers: "Tesla reaches $300" β†’ TSLA

Company β†’ SEC Filings β†’ Sentences

HAS_FILING β†’ has_section β†’ has_sentenceMulti-hop traversal for sentiment analysis: "Show negative FinBERT sentences from recent Apple 10-Ks"

Company β†’ Commodity Positions

HAS_COMMODITY_POSITION links companies to CFTC data for commodity exposure analysis

πŸ’‘ Example Multi-Hop Query

"Find energy companies with government contracts mentioning 'renewable' that are mentioned in prediction markets with volume > $50k"

β†’ Traverses Company β†’ Awards (semantic search) β†’ Markets (graph join) in milliseconds

πŸ€–AI Query Pipeline

When you ask a question, here's what happens behind the scenes:

Step 1: Intent Detection

GPT-4 classifies your query: Is it about a specific ticker (AAPL, MSFT) or a concept (AI, cybersecurity)? This determines whether to use exact matching or semantic search.

Input: "Show me AI companies with government contracts"
Intent: concept_query

Step 2: Query Planning

GPT-4 receives the full database schema (collections, fields, relationships) and generates optimized AQL (ArangoDB Query Language) with proper joins and filters.

FOR award IN Award
  FILTER COSINE_SIMILARITY(
    award.description_embedding,
    @query_vector
  ) >= 0.75
  FOR company IN Company
    FILTER company.ticker == award.ticker
    RETURN {company, award}

Step 3: Parallel Execution

Two queries run simultaneously:

  • β€’ Database Query: AQL executes against ArangoDB (historical data)
  • β€’ Web Search: Perplexity searches for current events (real-time context)

Step 4: Synthesis & Analysis

GPT-4 combines database results with web context, analyzes patterns, and generates:

  • β€’ Markdown Tables: Formatted results with key metrics
  • β€’ Insights: Trends, correlations, anomalies
  • β€’ Follow-up Questions: Suggested deeper dives

πŸ”§Technology Stack

Frontend

  • Framework: Next.js 14 (App Router)
  • UI: React 18, TypeScript
  • Styling: Tailwind CSS
  • Animations: Framer Motion
  • Graph Viz: ReactFlow
  • Hosting: Vercel

Backend

  • Framework: FastAPI (Python 3.13)
  • Validation: Pydantic
  • Security: SlowAPI rate limiting
  • LLM: OpenAI GPT-4
  • Web Search: Perplexity AI
  • Hosting: Railway

Database

  • Platform: ArangoDB Cloud
  • Type: Multi-model (Document + Graph)
  • Query Language: AQL
  • Embeddings: OpenAI text-embedding-3-small
  • Size: ~5GB (2M+ documents)
  • Location: Germany (GDPR compliant)

Data Pipeline

  • Orchestration: Apache Airflow
  • Processing: Python, Pandas, NumPy
  • Sentiment: FinBERT
  • Schedule: Daily updates at 2 AM UTC
  • Monitoring: Airflow UI + logs

⚑Performance & Scale

< 3s
Average Query Time
(DB + AI analysis)
2M+
Documents in Graph
(Companies, markets, filings)
50ms
Graph Traversal Time
(3-hop relationships)

Performance Optimizations

  • βœ“Persistent indexes on ticker, date, volume fields
  • βœ“Skip-list indexes for range queries
  • βœ“Edge collections for O(1) relationship lookups
  • βœ“Query result caching (5-minute TTL)
  • βœ“Parallel DB + web search execution
  • βœ“Streaming results with batch_size=1000

πŸ“ˆFuture Enhancements

⚑ Real-time Data

WebSocket connections for live market data updates, streaming prediction market probability changes as they happen.

πŸ“Š Portfolio Tracking

User accounts to track favorite companies, save queries, and set up alerts for specific market conditions.

πŸ€– Advanced ML Models

Time-series forecasting with LSTM, anomaly detection for unusual trading patterns, correlation discovery between data sources.

πŸ“Š More Data Sources

Twitter sentiment, Reddit discussions, earnings call transcripts, patent filings, and international market data.

πŸ“Š Custom Dashboards

Drag-and-drop dashboard builder with custom charts, metrics, and KPIs tailored to individual research needs.

πŸ”— API Access

Public API with authentication for programmatic access to KARGA capabilities, enabling integrations with trading platforms.

πŸ”“Open Source & Contributions

KARGA Markets is built with transparency in mind. While the core application is proprietary, we're exploring open-sourcing components of the query planning system and graph schema to help others build similar systems.

Interested in collaborating or have ideas for improvement? Reach out at karga.analytics@gmail.com