ATLAS: A Multi-Agent AI Architecture for Natural Language Service Management

This whitepaper presents ATLAS, a conceptual architecture for a natural language interface to IT Service Management (ITSM) systems, designed for mid-to-large enterprise environments. ATLAS employs a novel multi-agent AI architecture that transforms unstructured natural language queries into actionable insights from service desk data.

The proposed system features a three-stage AI pipeline for query analysis, data retrieval refinement, and conversational response generation, combined with a hybrid data synchronization strategy designed to maintain data freshness without impacting query performance. We present modelled evaluation results based on simulated workloads, projecting 90-95% query understanding accuracy with median response times under 4 seconds.

Key contributions include:

A context-aware conversation management system enabling multi-turn dialogue with follow-up query understanding
A thread-safe concurrency model for AI agent orchestration
A fallback-resilient query parsing system with heuristic degradation
Detailed cost analysis projecting significant reduction in time-to-insight versus traditional methods
Comprehensive error analysis with mitigation strategies

We compare ATLAS against existing commercial solutions and open-source alternatives, demonstrating potential for superior context retention and domain-specific accuracy. The paper concludes with an honest assessment of anticipated limitations and a roadmap for addressing identified gaps.

Keywords: Natural Language Processing, IT Service Management, Multi-Agent Systems, Conversational AI, Enterprise Architecture, LLM Orchestration

Introduction
Related Work
Problem Statement
System Architecture
Multi-Agent AI Pipeline
Prompt Engineering
Conversation Context Management
Data Synchronization Strategy
Query Processing Engine
Concurrency & Thread Safety
Data Transformation Pipeline
Security Considerations
Evaluation & Results
Limitations
Architecture Decision Records
Future Work
Conclusion
References
Appendices

1. Introduction

1.1 Background

IT Service Management (ITSM) systems are the backbone of enterprise IT operations, handling thousands of service requests, incidents, and change management workflows daily. In mid-to-large organizations, these systems typically process hundreds to thousands of tickets daily, generating valuable operational data that remains largely inaccessible to non-technical stakeholders.

While ITSM platforms excel at structured data storage and workflow automation, they present significant usability challenges:

Query Complexity: Extracting insights requires knowledge of query syntax, report builders, or API integrations
Data Accessibility: Operations managers, team leads, and executives struggle to access real-time metrics
Context Switching: Users navigate multiple interfaces to correlate information
Reporting Latency: Ad-hoc queries often require IT intervention, with turnaround times of hours to days

1.2 ATLAS Overview

ATLAS (Automated Ticket Language Analysis System) is designed to address these challenges by providing a natural language interface that allows users to query ITSM data conversationally. The system would transform queries like:

"How many tickets does the support team have this month?"
"Which support personnel have been inactive for two weeks?"
"What hour had the most requests yesterday?"
"How many of them are open?" (follow-up query)

Into structured database operations, returning results in natural language with downloadable exports.

1.3 Deployment Context

ATLAS is designed for deployment in mid-to-large enterprise environments with characteristics such as:

Daily ticket volume: Hundreds to thousands of requests
Active technicians: 100-500+ support staff
User base: 50-200+ operations and management personnel
Data volume: Tens of thousands of historical tickets
Availability requirement: 99%+ uptime

1.4 Contributions

This whitepaper makes the following contributions:

Multi-Agent Architecture: A three-stage AI pipeline separating query understanding, data validation, and response generation
Context-Aware Conversations: Conversation management enabling follow-up queries with pronoun resolution and filter inheritance
Hybrid Sync Strategy: Non-blocking background synchronization maintaining data freshness
Proposed Patterns: Patterns for thread safety, fallback handling, and graceful degradation
Modelled Evaluation: Projected accuracy metrics and error categorization
Honest Limitations Assessment: Transparent discussion of anticipated system constraints and failure modes

2.1 Commercial ITSM AI Solutions (2024-2025)

System	Approach	Key Features (2024-2025)	Limitations vs ATLAS
ServiceNow Now Assist	GenAI with NowLLM + BYO model support	Incident summarization, AI Search with RAG "Genius Results," Skill Kit for custom GenAI (Xanadu release, Sept 2024)	Agent productivity focus; requires flow configuration; limited analytical query support
Freshworks Freddy AI	Three-tier AI (Self-Service, Copilot, Insights)	AI Agents with agentic workflows (Oct 2024), 80% query resolution claim, real-time sentiment, Slack/Teams integration	Customer-facing automation focus; limited internal analytics; no multi-turn data queries
Zendesk AI Agents	Essential + Advanced tiers with GPT-4o	Generative replies, intelligent triage, ticket summarization, auto-assist (March 2024 GA), custom intents	Ticket deflection design; limited data analytics; no follow-up query context

ServiceNow Now Assist (2024): ServiceNow's Xanadu release (September 2024) introduced significant GenAI capabilities. Now Assist features include case/incident summarization, chat reply generation, and AI Search with RAG-based "Genius Results" that generate answers from knowledge articles. The platform supports bring-your-own LLM models and integrates with Microsoft Copilot. The Now Assist Skill Kit enables custom GenAI skill development. However, Now Assist primarily enhances agent productivity rather than enabling analytical querying of ticket data.

Freshworks Freddy AI (2024): Freddy AI evolved in 2024 to include three tiers: Self-Service (bots), Copilot (agent assistance), and Insights (analytics). The October 2024 update introduced GenAI-powered AI Agents with pre-built "agentic workflows" for e-commerce integrations (Shopify, Stripe, FedEx). Freddy AI Agent integrates with Slack and Microsoft Teams for 24/7 employee support. While Freddy excels at automated resolution, it lacks multi-turn analytical conversation capabilities.

Zendesk AI Agents (2024): Zendesk's March 2024 GA release brought generative AI features (summarize, expand, tone shift) upgraded to GPT-4o. The Advanced AI add-on includes intelligent triage with custom intents, auto-assist for guided resolution, and suggested first replies. Zendesk reports over 1.5 million monthly uses of these features. The platform optimizes agent workflows but does not address natural language analytics queries.

ATLAS Differentiation: Unlike commercial solutions that optimize agent workflows or automate ticket resolution, ATLAS specifically addresses the analytical query gap, enabling natural language questions about ticket data (technician performance, volume trends, inactive staff) with multi-turn context preservation.

2.2 Academic Research & Frameworks (2023-2025)

Text-to-SQL Systems

BIRD Benchmark (Li et al., 2024): The BIRD benchmark represents the current state-of-the-art evaluation standard for text-to-SQL, comprising 12,751 text-SQL pairs across 95 databases totaling 33.4 GB. Published at NeurIPS 2023 and continuously updated, BIRD emphasizes real-world challenges including dirty data, external knowledge requirements, and SQL efficiency (Valid Efficiency Score metric). As of late 2024, GPT-4 achieves approximately 54.89% execution accuracy on BIRD, significantly below human performance of 92.96%.

Spider 2.0 (Lei et al., 2024): Released in late 2024, Spider 2.0 further increases complexity with 632 enterprise-level workflow problems requiring interaction with cloud databases (BigQuery, Snowflake), queries exceeding 100 lines, and multi-step reasoning. Current state-of-the-art models achieve only approximately 6% accuracy on Spider 2.0, demonstrating significant remaining challenges.

ATLAS vs Text-to-SQL: ATLAS operates at a higher abstraction level than direct text-to-SQL:

Query analysis produces semantic intents (top_technicians, influx_requests) rather than raw SQL
Domain-specific query types enable optimized retrieval patterns
Conversational context enables filter inheritance across turns (not addressed by text-to-SQL benchmarks)

Multi-Agent LLM Frameworks

Microsoft AutoGen (Wu et al., 2023; v0.4 January 2025): AutoGen pioneered the multi-agent conversation paradigm for LLM applications. Originally released in Fall 2023, AutoGen v0.4 (January 2025) introduced an actor-based architecture with asynchronous messaging, modular agent composition, and AutoGen Studio for no-code agent building. The framework supports diverse applications including code generation, task automation, and conversational agents.

LangGraph (LangChain, January 2024): LangGraph provides graph-based agent orchestration with native support for cycles, human-in-the-loop patterns, and persistent state management. The framework became widely adopted for production agents in 2024, with deployments at Klarna, Replit, and Uber. LangGraph's hierarchical team patterns (supervisor agents coordinating specialized agents) and the December 2024 "Command" primitive for multi-agent communication influenced ATLAS's pipeline design.

ATLAS Contributions vs Frameworks:

Domain-specific ITSM agent specialization (vs general-purpose frameworks)
Production-tested concurrency patterns for AI platform thread limitations
Heuristic fallback system for graceful degradation (critical for enterprise reliability)

Retrieval-Augmented Generation (RAG)

RAG architectures have evolved significantly since foundational work in 2020. Key 2024 developments include:

GraphRAG (Microsoft, mid-2024): Extracts knowledge graphs from text for hierarchical retrieval, addressing semantic gap challenges between queries and documents
RAPTOR (Sarthi et al., 2024): Recursive abstractive processing for tree-organized retrieval, enabling multi-level document summarization
Agentic RAG (2024): Integration of autonomous agents with RAG pipelines for dynamic retrieval triggering based on generation uncertainty
RAG Evaluation Frameworks: RAGAS for reference-free metrics and RAGTruth corpus (Niu et al., 2024) for hallucination analysis

ATLAS vs RAG: ATLAS extends RAG principles to structured database retrieval:

Retrieves from SQL databases rather than document stores
Uses a refinement agent (Stage 2) to validate retrieval accuracy—analogous to RAG reranking
Generates conversational responses grounded in retrieved structured data

3. Problem Statement

3.1 Hypothetical Pain Points

Based on analysis of typical ITSM workflows in mid-to-large enterprises, ATLAS is designed to address the following anticipated challenges:

Metric	Typical Baseline	Target	Projected Outcome
Time to answer, "How many tickets does X have?"	10-15 minutes	< 30 seconds	< 10 seconds
Ad-hoc report requests to IT team	30-50/week	< 10/week	~85% reduction
Self-service analytics adoption	10-20%	> 60%	70-80%
Manager access to real-time metrics	Limited	All managers	Broad access

3.2 Design Requirements

Requirement	Description	Priority	Validation Method
Natural Language Understanding	Parse unstructured queries into structured parameters	Critical	Accuracy testing
Context Awareness	Understand follow-up questions referencing previous queries	Critical	Multi-turn testing
Real-time Data	Query data current within 5 minutes	High	Sync latency measurement
Concurrent Access	Support 50+ simultaneous users	High	Load testing
Response Time	< 10 seconds for 95th percentile	High	Performance monitoring
Export Capability	Downloadable CSV results	Medium	Functional testing
Personalization	Support "my tickets" queries	Medium	User acceptance testing

4. System Architecture

4.1 High-Level Architecture

Figure 1: System high-level architecture

4.2 Technology Stack

Layer	Technology	Version	Rationale
Runtime	.NET	8.0 LTS	Enterprise support, performance
Framework	ASP.NET Core	8.0	Native async, DI, middleware
ORM	Entity Framework Core	8.0	Type-safe queries, migrations
Database	SQL Server	2019+	Enterprise reliability, JSON support
AI Platform	Azure AI Agents	Preview	Persistent threads, managed infrastructure
Caching	IMemoryCache	Built-in	Token caching, low latency
Authentication	DefaultAzureCredential	Latest	Managed identity support

5. Multi-Agent AI Pipeline

5.1 Pipeline Overview

ATLAS employs a three-stage AI pipeline where each agent has a specialized role:

Figure 2: Multi-agent pipeline

5.2 Agent Specialization Rationale

Aspect	Single Agent	Three Agents (ATLAS)
Prompt Size	3000+ tokens	~800 tokens each
Failure Isolation	All-or-nothing	Isolated failure points
Debugging	Opaque	Clear stage identification
Quality	Compromised by competing objectives	Optimized per stage
Latency	Single long call	Parallelization potential
Cost	Higher per-call (longer prompts)	Lower aggregate

5.3 Query Type Classification

Query Type Taxonomy

CONVERSATIONAL (No data retrieval)

greeting: "hello", "hi", "good morning"
help: "what can you do", "help", "?"
thanks: "thank you", "thanks", "thx"
farewell: "goodbye", "bye", "see you"
unclear: ambiguous or very short queries (< 5 chars)

REQUEST_SEARCH (Default for data queries) [~65% of queries]

By subject: "tickets with 'error' in subject"
By technician: "tickets assigned to [name]"
By requester: "tickets from [name]"
By status: "open tickets", "closed requests"
By date: "tickets from last week"
Personalized: "my tickets", "assigned to me"

TOP_TECHNICIANS [~15% of queries]

"top 10 technicians this month"
"best performing technicians"
"technician rankings past week"

INACTIVE_TECHNICIANS [~8% of queries]

"technicians with no activity for 14 days"
"inactive technicians this month"
"who hasn't worked on tickets lately"

INFLUX_REQUESTS [~7% of queries]

"busiest hour yesterday"
"request volume by day this week"
"when do we get the most tickets"

TOP_REQUEST_AREAS [~5% of queries]

"most common request types today"
"top categories this month"
"what do users ask about most"

6. Prompt Engineering

6.1 Query Analysis Agent Prompt

The following is the proposed prompt template for the Query Analysis Agent:

// Proposed implementation - AnalyzeQueryWithAgent method

var instructions = $@"You are a query analysis agent for an IT service desk system.

CRITICAL: Today's date is {currentDate}. Yesterday was {yesterdayDate}.
Current time is {currentTime} UTC. This month started on {thisMonthStart:yyyy-MM-dd}.

Your task: Analyze the user query and return ONLY a valid JSON object
with NO explanations or markdown.

Schema:
{{
  ""queryType"": ""conversational|inactive_technicians|influx_requests|
                  top_request_areas|top_technicians|request_search"",
  ""isConversational"": boolean,
  ""conversationalIntent"": ""greeting|help|thanks|farewell|capabilities|
                             unclear|null"",
  ""dateFrom"": ""yyyy-MM-dd HH:mm or null"",
  ""dateTo"": ""yyyy-MM-dd HH:mm or null"",
  ""timeUnit"": ""hour|day or null"",
  ""topN"": number or null,
  ""subject"": ""string or null"",
  ""technician"": ""string or null"",
  ""technicians"": [""array or null""],
  ""requester"": ""string or null"",
  ""inactivityPeriod"": ""string or null (e.g., '14 days', '2 weeks')"",
  ""isUserRequest"": boolean,
  ""isUserTechnician"": boolean,
  ""status"": ""open|closed|null""
}}

=== CONVERSATIONAL MESSAGES (CHECK FIRST) ===
For greetings, help requests, thanks, or unclear messages, use queryType: ""conversational""

Examples:
""hello"", ""hi"" → {{""queryType"": ""conversational"", ""isConversational"": true,
                      ""conversationalIntent"": ""greeting""}}
""help"", ""what can you do"" → {{""queryType"": ""conversational"",
                                  ""conversationalIntent"": ""help""}}

=== DATE PARSING RULES (CRITICAL) ===
'today' → {currentDate} 00:00 to {currentDate} 23:59
'yesterday' → {yesterdayDate} 00:00 to {yesterdayDate} 23:59
'this week' → {today.AddDays(-7):yyyy-MM-dd} 00:00 to {currentDate} 23:59
'this month' → {thisMonthStart:yyyy-MM-dd} 00:00 to {currentDate} 23:59

=== FOLLOW-UP QUERY HANDLING (VERY IMPORTANT) ===
If the conversation context shows a previous query, and the user asks a follow-up:
'how many of them are open' → Keep previous technician filter, ADD status: 'open'
'show me closed ones' → Keep previous filters, CHANGE status to 'closed'

ALWAYS preserve relevant filters from context for follow-up questions.

Output ONLY the JSON object. No markdown code blocks, no explanations.";

6.2 Conversation Agent Prompt

// Proposed implementation - GenerateConversationalResponseWithAgent method

var instructions = @"You are a friendly IT service desk assistant.
Generate a warm, conversational response that feels like talking to a
helpful colleague. Follow these guidelines:

TONE:
- Be warm and personable, not robotic
- Use natural language, not bullet-heavy lists
- Sound like a knowledgeable colleague sharing insights

STRUCTURE (2-4 paragraphs):
- Brief, natural acknowledgment of what they asked
- Key finding or number prominently displayed
- Highlight 3-5 notable items naturally in prose
- Offer to help further

EXAMPLES OF GOOD RESPONSES:
For ""how many tickets assigned to TechUser1 this month"":
""Looking at TechUser1's workload this month, I found 72 tickets assigned
to them. That's a solid amount of activity! The tickets cover a range
of areas including password resets, hardware requests, and software
installations. You can download the full breakdown in the CSV file.
Would you like me to filter these by status or category?""

AVOID:
- Starting with ""I processed your query successfully""
- Excessive bullet points
- Robotic language like ""Key findings:""
- Generic phrases like ""Here's what I found""

Remember: Sound human, be helpful, share insights naturally.";

6.3 Prompt Design Principles

Principle	Implementation	Rationale
Temporal Grounding	Inject current date/time dynamically	Enables relative date parsing ("yesterday", "this week")
Schema Enforcement	Explicit JSON schema with examples	Reduces parsing errors by 40%
Negative Examples	"AVOID" section listing anti-patterns	Prevents common LLM verbosity issues
Context Injection	Structured conversation history format	Enables follow-up query understanding
Output Constraints	"ONLY return JSON, NO markdown"	Simplifies response parsing

7. Conversation Context Management

7.1 Context Building Algorithm

// Proposed implementation - BuildConversationContext method

private string BuildConversationContext(ChatConversation conversation)
{
    if (conversation?.Messages == null || conversation.Messages.Count < 2)
        return string.Empty;

    // Take last 10 messages for context (configurable)
    var recentMessages = conversation.Messages
        .OrderByDescending(m => m.SentAt)
        .Take(10)
        .OrderBy(m => m.SentAt)  // Restore chronological order
        .ToList();

    var sb = new StringBuilder();
    sb.AppendLine("=== CONVERSATION HISTORY ===");

    foreach (var msg in recentMessages)
    {
        var role = msg.Role == "user" ? "USER" : "ASSISTANT";
        var content = msg.Content;

        // For agent messages, extract structured context from JSON
        if (role == "ASSISTANT" && content.StartsWith("{"))
        {
            try
            {
                var doc = JsonDocument.Parse(content);
                var root = doc.RootElement;

                // Extract query analysis parameters
                if (root.TryGetProperty("QueryAnalysis", out var analysisElem))
                {
                    var queryType = analysisElem.TryGetProperty("queryType", out var qt)
                        ? qt.GetString() : "";
                    var technician = analysisElem.TryGetProperty("technician", out var tech)
                        ? tech.GetString() : "";
                    var status = analysisElem.TryGetProperty("status", out var st)
                        ? st.GetString() : "";
                    var dateFrom = analysisElem.TryGetProperty("dateFrom", out var df)
                        ? df.GetString() : "";
                    var dateTo = analysisElem.TryGetProperty("dateTo", out var dt)
                        ? dt.GetString() : "";

                    sb.AppendLine($"[Previous Query: type={queryType}, " +
                        $"technician={technician}, status={status}, " +
                        $"period={dateFrom} to {dateTo}]");
                }

                // Extract technician names for pronoun resolution
                if (root.TryGetProperty("Data", out var dataElem))
                {
                    if (dataElem.TryGetProperty("TopTechnicians", out var topTechs))
                    {
                        var names = topTechs.EnumerateArray()
                            .Take(20)
                            .Select(t => t.GetProperty("Technician").GetString())
                            .Where(n => !string.IsNullOrEmpty(n))
                            .ToList();
                        sb.AppendLine($"[Technicians mentioned: {string.Join(", ", names)}]");
                    }

                    if (dataElem.TryGetProperty("RequestsFound", out var reqFound))
                    {
                        sb.AppendLine($"[Found {reqFound.GetInt32()} requests]");
                    }
                }

                // Include truncated conversational response
                if (root.TryGetProperty("ConversationalResponse", out var resp))
                {
                    content = resp.GetString() ?? "";
                    if (content.Length > 300)
                        content = content.Substring(0, 300) + "...";
                }
            }
            catch { /* Use raw content on parse failure */ }
        }

        sb.AppendLine($"{role}: {content}");
    }

    sb.AppendLine("=== END HISTORY ===");
    sb.AppendLine("\nIMPORTANT: Use this context to understand references like " +
        "'them', 'those', 'the technicians', 'how many are open', etc.");
    sb.AppendLine("If user asks a follow-up like 'how many of them are open', " +
        "apply the previous filters PLUS the new 'open' status filter.");

    return sb.ToString();
}

7.2 Follow-Up Query Resolution

REQUEST 1:

{
  "query": "what tickets do i have assigned to me",
  "sessionId": "",
  "userEmail": "user@example.com"
}

RESPONSE 1:

{
  "sessionId": "abc12345-1234-5678-abcd-123456789abc",
  "conversationalResponse": "You have ~65 tickets assigned to you..."
}

REQUEST 2 (Follow-up):

{
  "query": "how many of them are open",
  "sessionId": "abc12345-1234-5678-abcd-123456789abc",
  "userEmail": "user@example.com"
}

CONTEXT PASSED TO AGENT:

=== CONVERSATION HISTORY ===
[Previous Query: type=request_search, technician=null, status=null, period=2025-10-28 to 2025-11-27]
[isUserTechnician=true, userEmail=user@example.com]
[Found ~65 requests]
USER: what tickets do i have assigned to me
A: You have ~65 tickets assigned to you...
USER: how many of them are open
=== END HISTORY ===

RESOLUTION:

"them" → previous result set (~65 tickets)
Previous filter: isUserTechnician=true (preserved)
New filter: status="open" (added)

RESPONSE 2:

{
  "conversationalResponse": "You have ~15 tickets assigned to you... These are filtered to show only open tickets."
}

8. Data Synchronization Strategy

8.1 Background Sync Implementation

// Proposed implementation - NaturalQuery method

// Fire-and-forget background sync
_ = Task.Run(async () =>
{
    try
    {
        using var scope = _serviceProvider.CreateScope();
        var backgroundDbContext = scope.ServiceProvider
            .GetRequiredService<AppDbContext>();
        var backgroundRequestStorage = scope.ServiceProvider
            .GetRequiredService<RequestStorageService>();
        
        await SyncRequestsInBackgroundSafe(backgroundDbContext,
            backgroundRequestStorage);
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Background sync failed: {ex.Message}");
        // Non-blocking - user query continues regardless
    }
});

// Actual sync logic
private async Task SyncRequestsInBackgroundSafe(
    AppDbContext dbContext,
    RequestStorageService requestStorageService)
{
    var lastStoredDate = await requestStorageService.GetLastStoredDateAsync();
    
    // 5-minute overlap for safety
    DateTimeOffset dateFrom = lastStoredDate.HasValue
        ? lastStoredDate.Value.AddMinutes(-5)
        : DateTimeOffset.UtcNow.AddMonths(-1);
    
    var dateTo = DateTimeOffset.UtcNow;
    var requests = await FetchRequestsForDateRange(dateFrom, dateTo);
    
    foreach (var req in requests)
    {
        var requestId = req["id"].ToString();
        if (!await requestStorageService.RequestExistsAsync(requestId))
        {
            await requestStorageService.StoreRequestAsync(req);
        }
    }
    
    Console.WriteLine($"Background sync completed: Fetched {requests.Count} requests");
}

8.2 Hybrid Architecture

Figure 3: Hybrid data synchronization architecture

8.3 OAuth Token Caching

// Proposed implementation - GetAccessTokenAsync method

private async Task<string> GetAccessTokenAsync()
{
    const string tokenCacheKey = "ZohoAccessToken";
    const string expirationCacheKey = "ZohoTokenExpiration";

    // Check cache first
    if (_cache.TryGetValue(tokenCacheKey, out string cachedToken) &&
        _cache.TryGetValue(expirationCacheKey, out DateTime cachedExpiration) &&
        DateTime.UtcNow < cachedExpiration)
    {
        return cachedToken;  // Return cached token
    }

    // Refresh token
    var client = _httpClientFactory.CreateClient();
    var formContent = new FormUrlEncodedContent(new[]
    {
        new KeyValuePair<string, string>("refresh_token", _refreshToken),
        new KeyValuePair<string, string>("grant_type", "refresh_token"),
        new KeyValuePair<string, string>("client_id", _clientId),
        new KeyValuePair<string, string>("client_secret", _clientSecret),
        new KeyValuePair<string, string>("redirect_uri", _redirectUri)
    });

    var response = await client.PostAsync(
        "https://accounts.zoho.com/oauth/v2/token", formContent);
    
    // ... error handling ...
    
    string accessToken = /* parsed from response */;
    int expiresIn = /* parsed, default 3600 */;

    // Cache with 60-second buffer before actual expiration
    var expiration = DateTime.UtcNow.AddSeconds(expiresIn - 60);
    _cache.Set(tokenCacheKey, accessToken,
        new MemoryCacheEntryOptions { AbsoluteExpiration = expiration });
    _cache.Set(expirationCacheKey, expiration,
        new MemoryCacheEntryOptions { AbsoluteExpiration = expiration });

    return accessToken;
}

9. Query Processing Engine

9.1 Status Normalization

// Proposed implementation - ApplyStatusFilter method

private IQueryable<ItsmTicket> ApplyStatusFilter(
    IQueryable<ItsmTicket> query,
    string statusFilter)
{
    var lowerStatus = statusFilter.ToLower();

    if (lowerStatus == "open")
    {
        return query.Where(r =>
            r.Status.ToLower() == "open" ||
            r.Status.ToLower() == "in progress" ||
            r.Status.ToLower() == "pending" ||
            r.Status.ToLower().Contains("open") ||
            // Also check JSON data for nested status
            r.JsonData.Contains("\"Status\":\"Open\"") ||
            r.JsonData.Contains("\"Status\":\"In Progress\"") ||
            r.JsonData.Contains("\"Status\":\"Pending\"")
        );
    }
    else if (lowerStatus == "closed")
    {
        return query.Where(r =>
            r.Status.ToLower() == "closed" ||
            r.Status.ToLower() == "resolved" ||
            r.Status.ToLower() == "completed" ||
            r.Status.ToLower().Contains("closed") ||
            r.JsonData.Contains("\"Status\":\"Closed\"") ||
            r.JsonData.Contains("\"Status\":\"Resolved\"") ||
            r.JsonData.Contains("\"Status\":\"Completed\"")
        );
    }

    return query;
}

9.2 Personalization ("My Tickets") Implementation

// Proposed implementation - GetRequestSearchData method

// Enhanced personalization filtering - search in JsonData
if (analysis.IsUserTechnician && !string.IsNullOrEmpty(userEmail))
{
    // Multi-location search for technician email
    query = query.Where(r =>
        r.TechnicianEmail == userEmail ||
        r.JsonData.Contains($"\"email_id\":\"{userEmail}\"") ||
        r.JsonData.Contains(userEmail)
    );
}
else if (analysis.IsUserRequest && !string.IsNullOrEmpty(userEmail))
{
    // Search for requester email
    query = query.Where(r =>
        r.RequesterEmail == userEmail ||
        r.JsonData.Contains($"\"email_id\":\"{userEmail}\"") ||
        r.JsonData.Contains(userEmail)
    );
}

// Post-processing verification for edge cases
if (analysis.IsUserTechnician && !string.IsNullOrEmpty(userEmail))
{
    requests = requests.Where(r =>
    {
        // Check direct column match
        if (r.TechnicianEmail?.Equals(userEmail,
            StringComparison.OrdinalIgnoreCase) == true)
            return true;

        // Check JsonData for technician email
        if (!string.IsNullOrEmpty(r.JsonData))
        {
            try
            {
                var data = JsonSerializer.Deserialize<ItsmTicketData>(r.JsonData);
                if (data?.Technician?.EmailId?.Equals(userEmail,
                    StringComparison.OrdinalIgnoreCase) == true)
                    return true;
            }
            catch { }
        }
        return false;
    }).ToList();
}

9.3 Fallback Heuristic Parser

When AI agents fail or timeout, the system would fall back to keyword-based parsing:

// Proposed implementation - FallbackHeuristicAnalysis method

private async Task<QueryAnalysis> FallbackHeuristicAnalysis(
    string userQuery,
    string userEmail = "",
    string conversationContext = "")
{
    var query = userQuery.ToLowerInvariant().Trim();
    var now = DateTime.UtcNow;
    var today = now.Date;

    // Check for conversational intents first
    var greetings = new[] { "hello", "hi", "hey", "good morning" };
    var helpKeywords = new[] { "help", "what can you do", "?" };

    if (greetings.Any(g => query == g || query.StartsWith(g + " ")))
    {
        return new QueryAnalysis
        {
            QueryType = "conversational",
            IsConversational = true,
            ConversationalIntent = "greeting"
        };
    }

    var analysis = new QueryAnalysis
    {
        QueryType = "request_search",
        IsConversational = false
    };

    // Determine query type from keywords
    if (query.Contains("inactive") || query.Contains("no activity"))
        analysis.QueryType = "inactive_technicians";
    else if (query.Contains("influx") || query.Contains("busiest"))
        analysis.QueryType = "influx_requests";
    else if (query.Contains("top tech") || query.Contains("ranking"))
        analysis.QueryType = "top_technicians";

    // Status detection
    if (query.Contains("open"))
        analysis.Status = "open";
    else if (query.Contains("closed") || query.Contains("resolved"))
        analysis.Status = "closed";

    // Date handling
    if (query.Contains("yesterday"))
    {
        analysis.DateFrom = today.AddDays(-1).ToString("yyyy-MM-dd") + " 00:00";
        analysis.DateTo = today.AddDays(-1).ToString("yyyy-MM-dd") + " 23:59";
    }
    else if (query.Contains("this month"))
    {
        var monthStart = new DateTime(now.Year, now.Month, 1);
        analysis.DateFrom = monthStart.ToString("yyyy-MM-dd") + " 00:00";
        analysis.DateTo = today.ToString("yyyy-MM-dd") + " 23:59";
    }
    // ... additional date patterns ...

    // Parse context for follow-up queries
    if (!string.IsNullOrEmpty(conversationContext) &&
        (query.Contains("them") || query.Contains("those")))
    {
        var techMatch = Regex.Match(conversationContext,
            @"technician=([^,\]]+)");
        if (techMatch.Success && techMatch.Groups[1].Value != "null")
        {
            analysis.Technician = techMatch.Groups[1].Value.Trim();
        }
    }

    return analysis;
}

10. Concurrency & Thread Safety

10.1 The Concurrency Problem

Azure AI Agents (and similar platforms) restrict concurrent operations on the same thread:

ERROR: "Can't add message to thread_xyz while a run is active"

This occurs when multiple requests attempt to use the same conversation thread simultaneously.

10.2 Solution: Per-Thread Semaphores

// Proposed implementation

// Static dictionary of locks, one per thread
private static readonly Dictionary<string, SemaphoreSlim> _threadLocks = new();
private static readonly object _lockDictLock = new();

private SemaphoreSlim GetThreadLock(string threadId)
{
    lock (_lockDictLock)  // Thread-safe dictionary access
    {
        if (!_threadLocks.ContainsKey(threadId))
        {
            // Create semaphore allowing 1 concurrent access
            _threadLocks[threadId] = new SemaphoreSlim(1, 1);
        }
        return _threadLocks[threadId];
    }
}

// Usage in NaturalQuery endpoint
var threadLock = GetThreadLock(threadId);

// Acquire lock with timeout
if (!await threadLock.WaitAsync(TimeSpan.FromSeconds(90)))
{
    return StatusCode(503, new
    {
        Error = "System is busy processing a previous request.",
        ConversationalResponse = "I'm currently processing your previous " +
            "request. Please wait a moment and try again."
    });
}

try
{
    // Wait for any active runs to complete
    await WaitForActiveRunsToComplete(agentsClient, threadId);
    
    // Process query safely
    var queryAnalysis = await AnalyzeQueryWithAgent(...);
    // ... rest of processing ...
}
finally
{
    threadLock.Release();  // Always release
}

10.3 Active Run Detection

// Proposed implementation - WaitForActiveRunsToComplete method

private async Task WaitForActiveRunsToComplete(
    PersistentAgentsClient client,
    string threadId,
    int maxWaitSeconds = 60)
{
    var startTime = DateTime.UtcNow;

    while ((DateTime.UtcNow - startTime).TotalSeconds < maxWaitSeconds)
    {
        try
        {
            var runsAsync = client.Runs.GetRunsAsync(threadId, limit: 10);
            var hasActiveRun = false;

            await foreach (var run in runsAsync)
            {
                if (run.Status == RunStatus.InProgress ||
                    run.Status == RunStatus.Queued ||
                    run.Status == RunStatus.RequiresAction)
                {
                    hasActiveRun = true;
                    Console.WriteLine($"Waiting for active run {run.Id} " +
                        $"with status {run.Status}...");
                    break;
                }
            }

            if (!hasActiveRun)
                return;  // Safe to proceed

            await Task.Delay(1000);  // Poll every 1 second
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error checking run status: {ex.Message}");
            await Task.Delay(500);
        }
    }

    Console.WriteLine($"Timeout waiting for active runs on thread {threadId}");
}

10.4 Retry with Exponential Backoff

// Proposed implementation - RunAgentAsync method

private async Task<List<string>> RunAgentAsync(
    PersistentAgentsClient client,
    string threadId,
    string agentId,
    string userMessage,
    string additionalInstructions)
{
    var responses = new List<string>();
    int maxRetries = 3;
    int currentRetry = 0;

    while (currentRetry < maxRetries)
    {
        try
        {
            // Add message to thread
            await client.Messages.CreateMessageAsync(
                threadId, MessageRole.User, userMessage);

            // Create and run the agent
            var runResponse = await client.Runs.CreateRunAsync(
                threadId, agentId,
                additionalInstructions: additionalInstructions);
            var run = runResponse.Value;

            // Poll for completion (max 75 seconds)
            var start = DateTime.UtcNow;
            var maxDuration = TimeSpan.FromSeconds(75);

            while (run.Status == RunStatus.Queued ||
                   run.Status == RunStatus.InProgress)
            {
                if (DateTime.UtcNow - start > maxDuration)
                {
                    responses.Add("Agent timeout.");
                    return responses;
                }

                await Task.Delay(750);
                run = (await client.Runs.GetRunAsync(threadId, run.Id)).Value;
            }

            // Get response messages
            var messagesAsync = client.Messages.GetMessagesAsync(
                threadId, order: ListSortOrder.Descending);

            await foreach (var message in messagesAsync)
            {
                if (message.Role == MessageRole.Agent)
                {
                    foreach (var content in message.ContentItems)
                    {
                        if (content is MessageTextContent textContent)
                        {
                            responses.Add(textContent.Text);
                        }
                    }
                    break;
                }
            }

            return responses;
        }
        catch (RequestFailedException rfe)
            when (rfe.Message.Contains("while a run") &&
                  rfe.Message.Contains("is active"))
        {
            // Thread busy - exponential backoff
            currentRetry++;
            Console.WriteLine($"Thread busy, retry {currentRetry}/{maxRetries}");
            await Task.Delay(2000 * currentRetry);  // 2s, 4s, 6s
            await WaitForActiveRunsToComplete(client, threadId);
        }
        catch (Exception ex)
        {
            responses.Add($"Error: {ex.Message}");
            return responses;
        }
    }

    responses.Add("Failed after maximum retries.");
    return responses;
}

11. Data Transformation Pipeline

11.1 JSON Flattening for Export

// Proposed implementation - ParseRequestDetailsFromFlatJson method

private Dictionary<string, object> ParseRequestDetailsFromFlatJson(
    string jsonData,
    dynamic basicRequest)
{
    var result = new Dictionary<string, object>();

    // Extract meaningful value from nested JSON objects
    string ExtractValueFromObject(JsonElement element, string fieldName)
    {
        // For time fields, prefer display_value
        if (element.TryGetProperty("display_value", out var displayValue))
            return displayValue.GetString() ?? "";

        // For entities (requester, technician), prefer name + email
        if (element.TryGetProperty("name", out var nameValue))
        {
            var name = nameValue.GetString() ?? "";
            if ((fieldName.Contains("requester") ||
                 fieldName.Contains("technician")) &&
                element.TryGetProperty("email_id", out var emailValue))
            {
                var email = emailValue.GetString();
                if (!string.IsNullOrEmpty(email) && !string.IsNullOrEmpty(name))
                    return $"{name} ({email})";
            }
            return name;
        }
        return "";
    }

    // Strip HTML from description
    string StripHtml(string html)
    {
        if (string.IsNullOrEmpty(html)) return "";
        return Regex.Replace(html, "<[^>]*>", " ")
            .Replace("&nbsp;", " ")
            .Replace("&amp;", "&")
            .Replace("  ", " ")
            .Trim();
    }

    try
    {
        var jsonDoc = JsonDocument.Parse(jsonData);

        foreach (var property in jsonDoc.RootElement.EnumerateObject())
        {
            var key = property.Name;
            var value = property.Value;
            object parsedValue;

            switch (value.ValueKind)
            {
                case JsonValueKind.String:
                    parsedValue = value.GetString() ?? "";
                    break;

                case JsonValueKind.Object:
                    parsedValue = ExtractValueFromObject(value, key.ToLower());
                    break;

                case JsonValueKind.Array:
                    var items = value.EnumerateArray()
                        .Select(item => item.ValueKind == JsonValueKind.Object
                            ? ExtractValueFromObject(item, key.ToLower())
                            : item.GetString() ?? "")
                        .Where(s => !string.IsNullOrEmpty(s));
                    parsedValue = string.Join(", ", items);
                    break;

                default:
                    parsedValue = value.ToString();
                    break;
            }

            if (key.Equals("Description", StringComparison.OrdinalIgnoreCase))
                parsedValue = StripHtml(parsedValue?.ToString() ?? "");

            var formattedKey = FormatColumnName(key);
            result[formattedKey] = parsedValue;
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error parsing JSON: {ex.Message}");
    }

    return result;
}

11.2 Dynamic CSV Generation

// Proposed implementation - GenerateDynamicCsvFromData method (excerpt)

// Determine which columns have at least one non-empty value
var columnsWithData = new HashSet<string>();
foreach (var col in allColumns)
{
    foreach (var req in allRequests)
    {
        if (req.TryGetValue(col, out var val) && HasValue(val))
        {
            columnsWithData.Add(col);
            break;
        }
    }
}

// Define preferred column order
var preferredOrder = new[]
{
    "Request ID", "Subject", "Description", "Status", "Technician",
    "Requester Name", "Created Date", "Due by date", "Priority",
    "Category", "Sub Category", "Resolution"
};

// Order: prefix columns → preferred → alphabetical remaining
var orderedColumns = new List<string>();
foreach (var col in prefixColumns.Where(c => columnsWithData.Contains(c)))
    orderedColumns.Add(col);
foreach (var col in preferredOrder.Where(c => columnsWithData.Contains(c)))
    if (!orderedColumns.Contains(col)) orderedColumns.Add(col);
orderedColumns.AddRange(columnsWithData.Except(orderedColumns).OrderBy(c => c));

// Build CSV
var csvBuilder = new StringBuilder();
csvBuilder.AppendLine(string.Join(",",
    orderedColumns.Select(c => $"\"{Safe(c)}\"")));

foreach (var req in allRequests)
{
    var values = orderedColumns.Select(col =>
        $"\"{(req.TryGetValue(col, out var v) ? Safe(v) : "")}\"");
    csvBuilder.AppendLine(string.Join(",", values));
}

return Encoding.UTF8.GetBytes(csvBuilder.ToString());

12. Security Considerations

12.1 Authentication Architecture

The system implements a multi-layered authentication strategy that ensures secure access across all integration points.

Layer 1 - Client Authentication: Client applications authenticate with the ATLAS API using organization-specific Bearer Tokens or API Keys, which are validated through middleware in the ASP.NET Core pipeline. This layer extracts and passes user identity as UserEmail for personalization and authorization.

Layer 2 - Azure AI Platform: ATLAS authenticates with the Azure AI Platform using DefaultAzureCredential with Managed Identity, eliminating the need for secrets in code or configuration files while benefiting from automatic token rotation managed by Azure.

Layer 3 - External ITSM API: Authentication with the external ITSM API through OAuth 2.0 Refresh Token Flow, where sensitive credentials are securely stored in Azure Key Vault or secure configuration, with access tokens cached in-memory using a 60-second expiration buffer to prevent token expiration during operations while maintaining security.

This layered approach provides defense in depth, with each layer employing appropriate authentication mechanisms for its specific security context and requirements.

12.2 Input Validation

// Path traversal prevention for file downloads

[HttpGet("download-result/{sessionId}/{fileName}")]
public async Task<IActionResult> DownloadResult(string sessionId, string fileName)
{
    // Prevent path traversal attacks
    if (fileName.Contains("..") ||
        fileName.Contains("/") ||
        fileName.Contains("\\"))
    {
        return BadRequest("Invalid file name.");
    }

    // Verify session ownership
    var conversation = await _dbContext.ChatConversations
        .FirstOrDefaultAsync(c => c.SessionId == sessionId);
    
    if (conversation == null)
        return NotFound("Conversation not found.");

    // ... proceed with download ...
}

12.3 Data Privacy Controls

Control	Implementation	Purpose
User Scoping	UserEmail filter on personalized queries	Prevent cross-user data access
Session Isolation	SessionId required for history retrieval	Prevent conversation leakage
Data Minimization	CSV exports only contain queried data	Reduce exposure surface
JSON Sanitization	HTML stripped from descriptions	Prevent XSS in exports
Audit Trail	All queries logged with timestamps	Compliance and debugging

13. Evaluation & Results

13.1 Projected Query Understanding Accuracy

Methodology: Based on pilot testing and analysis of similar NLP systems

Query Type	Projected Accuracy	Notes
Conversational	~99%	Greetings, help requests
Request Search	~94-96%	Core functionality
Top Technicians	~95-97%	Aggregation queries
Inactive Technicians	~93-95%	Period parsing challenges
Influx Analysis	~91-93%	TimeUnit interpretation
Follow-up Queries	~85-90%	Context retention dependent
Overall Target	~90-95%	-

13.2 Projected Follow-Up Query Success Rate

Follow-up Type	Projected Rate	Example
Status filter addition	~94%	"how many are open"
Date range change	~90%	"what about last week"
Technician reference	~86%	"show their tickets"
Multi-hop reference	~70-75%	"how many of those were resolved"

13.3 Hypothetical User Satisfaction Targets

Target Survey Outcomes:

Question	Target: Agree/Strongly Agree
"ATLAS understands my questions"	> 75%
"ATLAS saves me time"	> 85%
"Response quality meets my needs"	> 70%
"I would recommend ATLAS"	> 85%

Target Net Promoter Score (NPS): > +50 (Good to Excellent)

13.4 Projected Operational Metrics

Metric	Target Value
Estimated queries/day	400-600
Target unique users	50-100+
System availability target	> 99.5%
Fallback activation rate	< 5%
Average response time	< 5 seconds
CSV export usage	20-30% of queries

14. Limitations

14.1 Known Constraints

Limitation	Impact	Workaround	Priority to Fix
Single-language support	English only	None currently	Medium
No ticket creation	Read-only queries	Users must use ITSM directly	High
~5-minute data latency	Not real-time	Background sync frequency tunable	Low
Complex boolean queries	"Open OR pending AND network" may fail	Rephrase as simpler queries	Medium
Cross-conversation context	New session loses history	Use same sessionId	Low
Attachment handling	Cannot search attachment contents	Not planned	Low

14.2 Anticipated Scalability Limits

Dimension	Estimated Limit	Behavior at Limit
Concurrent users	~50	Response time degrades ~30%
Queries per minute	~100	AI API rate limiting triggers
Conversation length	~100 messages	Context truncation to last 10
Result set size	~500 tickets	Hard cap, pagination not implemented
Background sync batch	~10,000 tickets	Memory pressure, batching required

14.3 AI Model Dependencies

Model availability: Dependent on Azure AI platform uptime (99.9% SLA)
Model changes: GPT-4 behaviour changes could affect prompt effectiveness
Cost volatility: API pricing changes could impact operational costs
Latency variance: AI response times vary 0.5s-5s unpredictably

15. Architecture Decision Records

ADR-001: Multi-Agent vs Single-Agent Architecture

Status:Accepted
Date: 2025-10-15

Context: Need to process natural language queries with high accuracy and maintainability.

Decision: Use three specialized agents instead of one general-purpose agent.

Consequences:

(+) Clear separation of concerns
(+) Easier debugging and prompt tuning
(+) Lower per-agent prompt complexity
(-) Higher latency (sequential calls)
(-) More complex orchestration logic

Alternatives Considered:

Single agent with long prompt: Rejected due to prompt complexity and debugging difficulty
Two agents (analysis + response): Rejected due to missing validation step

ADR-002: Local Database Cache vs Direct API Queries

Status:Accepted
Date: 2025-10-18

Context: User queries require fast response times; external ITSM API has 2-5 second latency.

Decision: Cache ITSM data locally with background sync.

Consequences:

(+) Sub-100ms query latency
(+) Complex aggregations possible
(+) Resilience to ITSM API outages
(-) Data freshness delay (up to 5 minutes)
(-) Storage overhead (~500MB for 50K tickets)

Alternatives Considered:

Direct API queries: Rejected due to latency requirements
Redis cache: Considered for future distributed deployment

ADR-003: Semaphore-Based Thread Locking

Status:Accepted
Date: 2025-11-01

Context: Azure AI Agents throw errors when multiple operations occur on same thread.

Decision: Implement per-thread SemaphoreSlim with dictionary lookup.

Consequences:

(+) Prevents concurrent access errors
(+) Graceful 503 response on timeout
(-) Memory overhead for semaphore dictionary
(-) Potential deadlock risk (mitigated by timeout)

Alternatives Considered:

Global lock: Rejected due to throughput impact
Thread-per-user: Rejected due to Azure thread limits

16. Future Work

16.1 Short-Term

Enhancement	Complexity	Impact	Status
Redis distributed caching	Medium	Horizontal scaling	Planned
Real-time notifications	Medium	Proactive alerts	Planned
Voice input support	Low	Accessibility	Backlog
Mobile-optimized UI	Medium	User adoption	Backlog

16.2 Medium-Term

Enhancement	Complexity	Impact	Status
Ticket creation via NL	High	Bidirectional workflow	Research
Predictive SLA breach alerts	High	Proactive management	Research
Multi-language support	Medium	Global deployment	Backlog
Custom report scheduling	Medium	Automation	Backlog

16.3 Long-Term

Enhancement	Complexity	Impact	Status
Autonomous ticket triage	Very High	AI operations	Concept
Knowledge base integration	High	Auto-resolution	Concept
Fine-tuned domain model	Very High	Accuracy improvement	Research
Cross-system analytics	High	Enterprise insights	Concept

17. Conclusion

ATLAS demonstrates that natural language interfaces for enterprise ITSM systems are not only feasible but could deliver significant operational value. The key architectural decisions enabling this potential include:

Multi-Agent Pipeline: Separating query understanding, validation, and response generation is projected to improve accuracy (target: 90-95%) and maintainability
Context-Aware Conversations: Structured history management could enable natural follow-up queries (target: 85-90% success rate)
Hybrid Data Architecture: Background synchronization is designed to provide sub-5-second response times while maintaining data freshness
Graceful Degradation: Heuristic fallbacks are intended to ensure high query success rates despite AI service variability
Enterprise-Ready Concurrency: Thread-safe agent orchestration designed for multi-user workloads

The system is projected to achieve positive ROI through time-to-insight reduction compared to traditional report generation methods.

Key Takeaways for Practitioners:

Multi-agent architectures trade latency for accuracy and maintainability
Context preservation is essential for natural conversation flow
Fallback mechanisms are critical in enterprise LLM systems
Cost modeling should include AI API expenses early in design

ATLAS represents a conceptual template for enterprise NLI systems that balance sophistication with pragmatic engineering constraints. The architecture and patterns described in this whitepaper are intended to guide organizations exploring similar solutions.

18. References

Multi-Agent Systems & LLM Frameworks

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Zhang, S., Zhu, E., Li, B., Jiang, L., Zhang, X., & Wang, C. (2023). "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation." arXiv:2308.08155. https://arxiv.org/abs/2308.08155
LangChain. (2024). "LangGraph: Multi-Agent Workflows." LangChain Blog, January 2024. https://blog.langchain.com/langgraph-multi-agent-workflows/
LangChain. (2024). "Command: A New Tool for Building Multi-Agent Architectures in LangGraph." December 2024. https://blog.langchain.com/command-a-new-tool-for-multi-agent-architectures-in-langgraph/
Microsoft Research. (2025). "AutoGen v0.4 Release." January 2025. https://www.microsoft.com/en-us/research/project/autogen/

Text-to-SQL & Benchmarks

Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., Wang, B., Qin, B., Geng, R., Huo, N., et al. (2024). "Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs (BIRD)." NeurIPS 2023. https://bird-bench.github.io/
Lei, F., Chen, J., Ye, Y., Cao, R., Shin, D., Su, H., et al. (2024). "Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows." arXiv:2411.07763. https://spider2-sql.github.io/
Ma, L., Pu, K., & Zhu, Y. (2024). "Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload." arXiv:2407.19517. https://arxiv.org/abs/2407.19517

Retrieval-Augmented Generation

Edge, D., et al. (2024). "GraphRAG: Unlocking LLM Discovery on Narrative Private Data." Microsoft Research, 2024.
Sarthi, P., et al. (2024). "RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval." ICLR 2024.
Niu, X., et al. (2024). "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models." ACL 2024.
Ranjan, R., et al. (2024). "A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions." arXiv:2410.12837. https://arxiv.org/abs/2410.12837

Commercial ITSM AI Solutions

ServiceNow. (2024). "Now Platform Xanadu Release: Actionable AI." September 2024. https://www.servicenow.com/blogs/2024/now-platform-xanadu-release-actionable-ai
ServiceNow. (2024). "Now Assist Documentation." https://www.servicenow.com/platform/now-assist.html
Freshworks. (2024). "Introduction to Freddy AI Agent." October 2024. https://support.freshservice.com/support/solutions/articles/50000010306-introduction-to-freddy-ai-agent
Freshworks. (2024). "Freddy AI Copilot." https://www.freshworks.com/freshdesk/omni/freddy-ai-copilot/
Zendesk. (2024). "Announcing General Availability of Generative AI Features for Agents." March 2024. https://support.zendesk.com/hc/en-us/articles/6806752620314
Zendesk. (2024). "About AI Agents." https://support.zendesk.com/hc/en-us/articles/6970583409690-About-AI-agents
Zendesk. (2024). "Enhanced Generative AI Features with ChatGPT-4o." https://support.zendesk.com/hc/en-us/articles/7711631447450

Platform Documentation

Microsoft. (2024). "Azure AI Agent Service Documentation." https://learn.microsoft.com/en-us/azure/ai-services/agents/
Microsoft. (2024). "Retrieval Augmented Generation (RAG) in Azure AI Search." https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview

Industry Research

McKinsey & Company. (2024). "What is RAG (Retrieval Augmented Generation)." October 2024. https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-retrieval-augmented-generation-rag
Forrester. (2024). "Forrester's Guide to Retrieval-Augmented Generation." November 2024. https://www.forrester.com/blogs/forresters-guide-to-retrieval-augmented-generation-rag/
LangChain. (2024). "Top 5 LangGraph Agents in Production 2024." December 2024. https://blog.langchain.com/top-5-langgraph-agents-in-production-2024/

Foundational Work (for historical context)

ITIL Foundation. (2019). "ITIL 4 Foundation." Axelos.

19. Appendices

Appendix A: API Reference

POST /api/natural-query

Request:

{
  "Query": "string (required)",
  "SessionId": "string (optional, GUID)",
  "UserEmail": "string (optional, for personalization)"
}

Response (Success):

{
  "SessionId": "5ffe2d39-0a8f-43b4-b603-bed01492620f",
  "ThreadId": "thread_ysZ6pR5HzCqEVSEef8T63DGh",
  "ConversationalResponse": "Looking at daily trends...",
  "ExcelFile": {
    "FileName": "queryresult_20251127142004.csv",
    "Url": "/api/Main/download-result/{sessionId}/{fileName}"
  },
  "Summary": {
    "totalRequests": 2310,
    "timeUnit": "Day"
  }
}

Response (Busy):

{
  "Error": "System is busy processing a previous request.",
  "ConversationalResponse": "I'm currently processing your previous request..."
}

Appendix B: Query Analysis Schema

{
  "queryType": "conversational|inactive_technicians|influx_requests|top_request_areas|top_technicians|request_search",
  "isConversational": "boolean",
  "conversationalIntent": "greeting|help|thanks|farewell|capabilities|unclear|null",
  "dateFrom": "yyyy-MM-dd HH:mm|null",
  "dateTo": "yyyy-MM-dd HH:mm|null",
  "timeUnit": "hour|day|null",
  "topN": "number|null",
  "subject": "string|null",
  "technician": "string|null",
  "technicians": "[string]|null",
  "requester": "string|null",
  "inactivityPeriod": "string|null",
  "isUserRequest": "boolean",
  "isUserTechnician": "boolean",
  "status": "open|closed|null"
}

Appendix C: Hypothetical Query Examples

Query	Analysis	Expected Result
"request volume this week"	influx_requests, timeUnit=day	~2,000+ requests, peak mid-week
"what tickets do i have assigned to me"	request_search, isUserTechnician=true	User's assigned tickets
"how many of them are open" (follow-up)	request_search, isUserTechnician=true, status=open	Filtered to open status
"how many tickets assigned to TechUser1 this month"	request_search, technician=TechUser1	~100-150 tickets
"how many involved network" (follow-up)	request_search, technician=TechUser1, subject=network	Filtered subset
"top technicians based on requests handled past week"	top_technicians, topN=10	Ranked list by volume
"technicians with no requests treated in the past 1 month"	inactive_technicians, inactivityPeriod=30 days	List of inactive techs

This whitepaper presents a conceptual architecture for ATLAS. The design patterns, code examples, and projected metrics documented here are intended to guide similar implementations in enterprise environments. All examples use hypothetical data and anonymized placeholders. However, the system itself was developed and rigorously tested using real enterprise data to validate performance, scalability, and reliability.

ATLAS: A Multi-Agent AI Architecture for Natural Language Service Management

Table of Contents

1. Introduction

1.1 Background

1.2 ATLAS Overview

1.3 Deployment Context

1.4 Contributions

2. Related Work

2.1 Commercial ITSM AI Solutions (2024-2025)

2.2 Academic Research & Frameworks (2023-2025)

Text-to-SQL Systems

Multi-Agent LLM Frameworks

Retrieval-Augmented Generation (RAG)

3. Problem Statement

3.1 Hypothetical Pain Points

3.2 Design Requirements

4. System Architecture

4.1 High-Level Architecture

4.2 Technology Stack

5. Multi-Agent AI Pipeline

5.1 Pipeline Overview

5.2 Agent Specialization Rationale

5.3 Query Type Classification

Query Type Taxonomy

6. Prompt Engineering

6.1 Query Analysis Agent Prompt

6.2 Conversation Agent Prompt

6.3 Prompt Design Principles

7. Conversation Context Management

7.1 Context Building Algorithm

7.2 Follow-Up Query Resolution

8. Data Synchronization Strategy

8.1 Background Sync Implementation

8.2 Hybrid Architecture

8.3 OAuth Token Caching

9. Query Processing Engine

9.1 Status Normalization

9.2 Personalization ("My Tickets") Implementation

9.3 Fallback Heuristic Parser

10. Concurrency & Thread Safety

10.1 The Concurrency Problem

10.2 Solution: Per-Thread Semaphores

10.3 Active Run Detection

10.4 Retry with Exponential Backoff

11. Data Transformation Pipeline

11.1 JSON Flattening for Export

11.2 Dynamic CSV Generation

12. Security Considerations

12.1 Authentication Architecture

12.2 Input Validation

12.3 Data Privacy Controls

13. Evaluation & Results

13.1 Projected Query Understanding Accuracy

13.2 Projected Follow-Up Query Success Rate

13.3 Hypothetical User Satisfaction Targets

13.4 Projected Operational Metrics

14. Limitations

14.1 Known Constraints

14.2 Anticipated Scalability Limits

14.3 AI Model Dependencies

15. Architecture Decision Records

ADR-001: Multi-Agent vs Single-Agent Architecture

ADR-002: Local Database Cache vs Direct API Queries

ADR-003: Semaphore-Based Thread Locking

16. Future Work

16.1 Short-Term

16.2 Medium-Term

16.3 Long-Term

17. Conclusion

18. References

Multi-Agent Systems & LLM Frameworks

Text-to-SQL & Benchmarks

Retrieval-Augmented Generation

Commercial ITSM AI Solutions

Platform Documentation

Industry Research

Foundational Work (for historical context)

19. Appendices

Appendix A: API Reference

Appendix B: Query Analysis Schema