<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Non-Brand Data]]></title><description><![CDATA[Non-Brand Data helps data professionals build practical ML, GenAI, and analytics judgment through structured essays, field guides, templates, and applied workflows.]]></description><link>https://www.nb-data.com</link><image><url>https://substackcdn.com/image/fetch/$s_!06DP!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6c0e1cde-d120-4029-8ffd-2a8c7c6e4504_1280x1280.png</url><title>Non-Brand Data</title><link>https://www.nb-data.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 13 Jun 2026 17:43:49 GMT</lastBuildDate><atom:link href="https://www.nb-data.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Cornellius Yudha Wijaya]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[cornellius@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[cornellius@substack.com]]></itunes:email><itunes:name><![CDATA[Cornellius Yudha Wijaya]]></itunes:name></itunes:owner><itunes:author><![CDATA[Cornellius Yudha Wijaya]]></itunes:author><googleplay:owner><![CDATA[cornellius@substack.com]]></googleplay:owner><googleplay:email><![CDATA[cornellius@substack.com]]></googleplay:email><googleplay:author><![CDATA[Cornellius Yudha Wijaya]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow]]></title><description><![CDATA[A concise guide to achieve success in your production GenAI generation]]></description><link>https://www.nb-data.com/p/from-prompt-to-reliable-output-a</link><guid isPermaLink="false">https://www.nb-data.com/p/from-prompt-to-reliable-output-a</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sun, 31 May 2026 07:47:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!C4_t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C4_t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C4_t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!C4_t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!C4_t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!C4_t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C4_t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1763111,&quot;alt&quot;:&quot;From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/199950174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" title="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" srcset="https://substackcdn.com/image/fetch/$s_!C4_t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!C4_t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!C4_t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!C4_t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17ba1e8e-e774-41a5-9a84-1f3ef25ddf9b_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Prompt engineering is rarely enough to prepare a GenAI system for production.</p><p>While a single prompt can generate a good output in initial testing, deploying it across hundreds of real-world inputs exposes failures like missing details, incorrect facts, and formatting errors.</p><p>To build a production-ready system, you must transition from prompt optimization to systematic evaluation.</p><p><strong>A prompt defines your request; an evaluation workflow defines your standard and proves whether the system meets it.</strong></p><p>Here is a practical workflow to evaluate and improve GenAI applications.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SqQo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SqQo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!SqQo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!SqQo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!SqQo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SqQo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:704000,&quot;alt&quot;:&quot;From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/199950174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" title="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" srcset="https://substackcdn.com/image/fetch/$s_!SqQo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!SqQo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!SqQo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!SqQo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a0c054-22b7-478a-b984-bc90bf81e7bf_1024x1024.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>1. The Limits of Initial Testing</h1><p>It is easy to mistake a fluent LLM output for a correct one.</p><p>If a response uses a professional tone and has no grammatical errors, we assume it is accurate.</p><p>However, LLMs are non-deterministic, and a prompt that works for one test case can fail on another.</p><p>In production, incorrect outputs carry significant risks&#8212;whether it is a fabricated metric in an executive report or an incorrect policy in a customer support tool.</p><p>Rather than testing under ideal conditions, developers must identify where a system fails under real-world inputs.</p><div><hr></div><h1>2. Why Prompting Alone is Insufficient</h1><p>A well-crafted prompt is not a testing framework.</p><p>Relying entirely on prompts to ensure quality is insufficient for three reasons:</p><ul><li><p><strong>Input Variability:</strong> Real-world queries are often messy, incomplete, or poorly formatted.</p></li><li><p><strong>Model Variability</strong>: Even at temperature 0.0, models can generate slightly different outputs for the same input.</p></li><li><p><strong>System Dependencies</strong>: In complex architectures like Retrieval-Augmented Generation (RAG), the LLM is only one component. A prompt cannot correct a failure in the retrieval step.</p></li></ul><p>A polished output does not guarantee a reliable system.</p><p><strong>Instead of relying solely on prompt adjustments, you need a structured evaluation workflow.</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/from-prompt-to-reliable-output-a?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/from-prompt-to-reliable-output-a?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h1>3. The 7-Step Evaluation Workflow</h1><p>To ensure predictable GenAI behavior, teams should implement a repeatable seven-step evaluation workflow.</p><ul><li><p>Step 1: Define the task in operational terms.</p></li><li><p>Step 2: Build a representative evaluation set.</p></li><li><p>Step 3: Break evaluation into specific dimensions.</p></li><li><p>Step 4: Choose the right grading method.</p></li><li><p>Step 5: Log failures and classify the errors.</p></li><li><p>Step 6: Modify one variable at a time.</p></li><li><p>Step 7: Define production thresholds.</p></li></ul><h2><strong>Step 1: Define the task in operational terms</strong></h2><p>You must translate vague requirements into objective criteria. For example, instead of asking for &#8220;a good summary,&#8221; define the exact parameters:</p><ul><li><p><strong>KPI Commentary:</strong> Accurate metrics, no causal claims unless explicitly backed by the data source, a concise tone, and a maximum of 150 words.</p></li><li><p><strong>SQL Explainer:</strong> Explains joins and filters correctly in plain language, linking them to a specific KPI.</p></li><li><p><strong>Customer RAG:</strong> Answers using only the provided context, cites sources, and states &#8220;I do not know&#8221; if context is missing.</p></li></ul><p>Identify the specific tasks, target audience, constraints, and failures that would render an output unusable.</p><h2><strong>Step 2: Build a representative evaluation set</strong></h2><p>Start with a small evaluation set of 10 to 30 examples. A massive evaluation set is difficult to manage early in development. The set must reflect real-world inputs rather than just ideal cases. It should contain:</p><ul><li><p><strong>Standard inputs:</strong> Common queries to test baseline functionality.</p></li><li><p><strong>Complex/Ambiguous inputs:</strong> Requests with mixed sentiments or multi-step instructions.</p></li><li><p><strong>Edge cases:</strong> Inputs with missing context or specific formatting constraints.</p></li><li><p><strong>High-risk inputs:</strong> Scenarios where errors have significant business or legal impacts.</p></li></ul><p>For example, the initial evaluation set might include for customer evaluation could be like below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ta6_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ta6_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png 424w, https://substackcdn.com/image/fetch/$s_!ta6_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png 848w, https://substackcdn.com/image/fetch/$s_!ta6_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png 1272w, https://substackcdn.com/image/fetch/$s_!ta6_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ta6_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png" width="1165" height="400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1165,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59672,&quot;alt&quot;:&quot;From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/199950174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" title="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" srcset="https://substackcdn.com/image/fetch/$s_!ta6_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png 424w, https://substackcdn.com/image/fetch/$s_!ta6_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png 848w, https://substackcdn.com/image/fetch/$s_!ta6_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png 1272w, https://substackcdn.com/image/fetch/$s_!ta6_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3adcdc60-a092-425e-bedd-2932b35bc875_1165x400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Step 3: Break evaluation into specific dimensions</strong></h2><p>Avoid grading outputs with a single overall score, as it blends separate failure modes. Instead, assess performance across specific dimensions:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZHH6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZHH6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png 424w, https://substackcdn.com/image/fetch/$s_!ZHH6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png 848w, https://substackcdn.com/image/fetch/$s_!ZHH6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png 1272w, https://substackcdn.com/image/fetch/$s_!ZHH6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZHH6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png" width="1168" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98810,&quot;alt&quot;:&quot;From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/199950174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" title="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" srcset="https://substackcdn.com/image/fetch/$s_!ZHH6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png 424w, https://substackcdn.com/image/fetch/$s_!ZHH6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png 848w, https://substackcdn.com/image/fetch/$s_!ZHH6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png 1272w, https://substackcdn.com/image/fetch/$s_!ZHH6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa068874c-d646-4cbe-87e9-1e0a07d0a305_1168x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Select the dimensions relevant to your application. A RAG system focuses on Grounding and Completeness, while a code generation tool focuses on Task Success and Format Fidelity.</p><h2><strong>Step 4: Choose the right grading method</strong></h2><p>Use the simplest method that provides accurate results. You can grade outputs using four primary approaches:</p><ol><li><p><strong>Rule-Based Checks: </strong>Programmatic, deterministic, and highly reliable. Ideal for formatting and constraints (e.g., verifying JSON schema or character counts).</p></li><li><p><strong>Reference-Based Checks: </strong>Used when there is a ground-truth answer (e.g., comparing classification labels or verifying generated SQL output against a reference database query).</p></li><li><p><strong>LLM-as-a-Judge: </strong>Used for semantic or stylistic dimensions like tone and factual consistency at scale. These require a strict grading rubric and few-shot examples to maintain consistency.</p></li><li><p><strong>Human Review: </strong>Recommended for highly sensitive or high-impact tasks. Spot-checks by domain experts are also used to calibrate and validate automated LLM judges.</p></li></ol><h2><strong>Step 5: Log failures and classify the errors</strong></h2><p>When a test case fails, identify the root cause before changing variables. Failure in a GenAI system does not always stem from the prompt.</p><p>Classify errors into specific categories:</p><ul><li><p><strong>Prompt Issue:</strong> Instructions are vague or contain conflicting constraints.</p></li><li><p><strong>Retrieval Issue:</strong> The context provided to the model is incomplete, irrelevant, or outdated.</p></li><li><p><strong>Data Issue</strong>: The underlying data source contains incorrect or corrupted information.</p></li><li><p><strong>Model Issue:</strong> The model ignores instructions or generates incorrect claims despite correct prompts and context.</p></li><li><p><strong>Requirements Issue:</strong> The operational criteria for the task were poorly defined.</p></li></ul><p>For example, a retrieval failure cannot be fixed by editing the prompt, and a data quality issue cannot be resolved by upgrading the model. Fix the issue at its source.</p><h2><strong>Step 6: Modify one variable at a time</strong></h2><p>When optimizing the system, change only one variable at a time to isolate what improves or degrades performance.</p><p>Follow this process:</p><ol><li><p><strong>Establish a baseline:</strong> Run your current evaluation set.</p></li><li><p><strong>Identify failures:</strong> Audit failed cases to determine the primary error type.</p></li><li><p><strong>Isolate a single change:</strong> Modify exactly one parameter (e.g., update a prompt rule, adjust chunk size, or change the model temperature).</p></li><li><p><strong>Rerun and compare:</strong> Run the evaluation set again and compare results against the baseline to verify improvement.</p></li></ol><h2><strong>Step 7: Define production thresholds</strong></h2><p>Establish clear metric thresholds to determine if the system is ready to deploy. For subjective dimensions, a standard 1-to-5 rubric is useful:</p><ul><li><p>5 &#8212; Accurate, fully grounded in context, and correctly formatted.</p></li><li><p>4 &#8212; High quality; minor stylistic issues but safe to deploy without review.</p></li><li><p>3 &#8212; Generally correct; minor phrasing or formatting issues requiring human oversight.</p></li><li><p>2 &#8212; Significant gaps, unsupported claims, or ignored constraints.</p></li><li><p>1 &#8212; Incorrect, contains major errors, or is structurally broken.</p></li></ul><p>Set your deployment thresholds based on risk. A low-risk internal tool might require a minimum average score of 3.5, whereas a high-risk or external application may require a minimum of 4.5 on all core dimensions and a strict 5.0 for factual grounding.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Non-Brand Data&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Non-Brand Data</span></a></p><div><hr></div><h1>4. Case Study: Evaluation in Practice</h1><p>Consider a GenAI assistant designed to summarize raw analyst notes for stakeholders.</p><p><strong>Input Data (Analyst Notes):</strong></p><p>* Metrics: Active customers at 12,400 (up 8% QoQ, down 3% YoY due to a seasonal promotion). Revenue at $4.2M (met target of $4.1M, driven by enterprise renewals).<br>* Churn: Rose to 4.2% in March (up from 3.5% in January). CS team suspects a competitor release.<br>* Next Actions: CS team to contact high-risk renewals; Product team to release an update in June.</p><p><strong>Generated Output (With Errors):</strong></p><p>An LLM generates the following summary:</p><p>&#8220;Q1 was a stellar quarter for the business. Active customers reached 12,400, showing strong growth. Total revenue reached $4.2M, beating our target of $4.1M due to an incredibly successful seasonal marketing campaign. Although churn rose to 4.2%, our proactive CS team has already contacted all high-risk accounts to guarantee renewal.&#8221;</p><p>A structured assessment reveals key discrepancies:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lfZ2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lfZ2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png 424w, https://substackcdn.com/image/fetch/$s_!lfZ2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png 848w, https://substackcdn.com/image/fetch/$s_!lfZ2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png 1272w, https://substackcdn.com/image/fetch/$s_!lfZ2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lfZ2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png" width="1170" height="968" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:968,&quot;width&quot;:1170,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98578,&quot;alt&quot;:&quot;From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/199950174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" title="From Prompt to Reliable Output: A Practical GenAI Evaluation Workflow" srcset="https://substackcdn.com/image/fetch/$s_!lfZ2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png 424w, https://substackcdn.com/image/fetch/$s_!lfZ2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png 848w, https://substackcdn.com/image/fetch/$s_!lfZ2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png 1272w, https://substackcdn.com/image/fetch/$s_!lfZ2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee1c1294-f07d-44f6-a2d3-6ab64d0ebf3b_1170x968.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Evaluating these distinct dimensions allows the team to identify exactly where the model failed. The prompt can then be updated with constraints requiring objective reporting and preventing the model from inferring causality.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/from-prompt-to-reliable-output-a/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/from-prompt-to-reliable-output-a/comments"><span>Leave a comment</span></a></p><div><hr></div><h1>5. Why Technical Teams Must Own Evaluation</h1><p>Evaluating software and data models is a fundamental engineering discipline. Teams already possess the core mental models required:</p><ul><li><p>Understanding that single test cases are not representative of overall performance.</p></li><li><p>Distinguishing between individual qualitative examples and statistical evidence.</p></li><li><p>Applying structured metrics like precision, recall, and error distributions to assess behavior.</p></li></ul><p>Deploying GenAI requires applying these same disciplines to unstructured outputs. </p><p>Reliable production systems are built by designing, testing, and verifying performance systematically rather than focusing solely on creative prompt writing.</p><div><hr></div><h1>Conclusion</h1><p>Prompting is a starting point, but systematic evaluation is what makes a system production-ready. By defining tasks operationally, building representative evaluation sets, assessing performance across distinct dimensions, and optimizing variables individually, developers can build dependable GenAI applications.</p>]]></content:encoded></item><item><title><![CDATA[Best MCP Servers for Stock Market Data]]></title><description><![CDATA[The best MCP servers for connecting AI agents to financial workflow]]></description><link>https://www.nb-data.com/p/best-mcp-servers-for-stock-market</link><guid isPermaLink="false">https://www.nb-data.com/p/best-mcp-servers-for-stock-market</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Thu, 21 May 2026 15:58:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QrbF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QrbF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QrbF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QrbF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QrbF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QrbF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QrbF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg" width="1280" height="853" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:853,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QrbF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QrbF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QrbF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QrbF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f5f7c9b-5a53-48f5-b71c-c46bc58678d1_1280x853.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@yorgosntrahas?utm_source=medium&amp;utm_medium=referral">Yorgos Ntrahas</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></div><p>AI agents are becoming more useful for financial research, but they are only as reliable as the data they can access.</p><p>A model can summarize earnings, compare stocks, or assist in market monitoring. But without easy access to structured financial data, tasks become stale. Developers may write custom API wrappers or rely on models to interpret data services, which isn&#8217;t ideal given the need for accurate market info.</p><p>This is where <strong>Model Context Protocol (MCP)</strong> becomes useful.</p><p>An MCP server provides an AI agent with a consistent framework for finding tools, calling data services, and interacting reliably with external systems. In stock market scenarios, this enables an agent to access data uniformly, for example, fundamentals, through a structured interface.</p><p>However, not every market data provider is equally suited to MCP-based workflows. Some offer broad asset coverage and agent-native integrations, while others excel at historical data, institutional-grade fundamentals, or enterprise financial analytics. The right choice depends on what you want to build.</p><p>In this article, we compare the best MCP servers for stock market data, focusing on five providers:</p><ol><li><p>Alpha Vantage</p></li><li><p>Nasdaq Data Link</p></li><li><p>Tiingo</p></li><li><p>Intrinio</p></li><li><p>FactSet</p></li></ol><p>Curious about it? Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>1. Alpha Vantage (Best Overall)</h3><p><a href="http://alphavantage.co">Alpha Vantage</a> is the strongest overall choice for developers who want to connect LLMs and AI agents to stock market data through MCP. Its official MCP server gives agents a structured way to discover financial tools, retrieve the right data, and work with market information more reliably than a custom set of hardcoded API calls.</p><p>This is crucial because financial agents need more than simple price lookups. They may require access to historical OHLCV data, performance comparisons among multiple tickers, or the integration of market information with macroeconomic data. Alpha Vantage distinguishes itself as one of the few providers capable of supporting this wide range of workflows from a single data source. Its extensive coverage&#8202;&#8212;&#8202;including stocks, ETFs, funds, indices, options, forex, commodities, fundamentals, technicals, and economic indicators&#8202;&#8212;&#8202;ensures it remains a leading choice.</p><p>Alpha Vantage stands out for its market data, which is well-suited to serious financial applications and seamless integration with structured information for agents to reason over. The MCP layer enhances usability by exposing tools that an agent can inspect and invoke directly.</p><p>The Alpha Vantage MCP server is easy to implement and integrates with popular agent environments like Claude, Cursor, VS Code, ChatGPT, and others. Developers can connect remotely via a single MCP URL or run locally with uvx, making it perfect for research agents, coding tools, financial dashboards, and prototypes with minimal setup.</p><p>Alpha Vantage is ideal for MCP workflows that go beyond raw data retrieval to analysis. Users can ask an agent to retrieve NVDA&#8217;s recent prices, calculate RSI, summarize trends, or compare quarterly movements across AI stocks, or combine fundamentals and technicals for a market brief. This versatility makes Alpha Vantage the best default for stock market MCP integrations.</p><h4>Quickstart Example</h4><p>For a remote MCP connection:</p><pre><code>https://mcp.alphavantage.co/mcp?apikey=YOUR_API_KEY</code></pre><p>For a local MCP connection:</p><pre><code>uvx marketdata-mcp-server YOUR_API_KEY</code></pre><p>Example Cursor configuration:</p><pre><code>{
  &#8220;mcpServers&#8221;: {
    &#8220;alphavantage&#8221;: {
      &#8220;url&#8221;: &#8220;https://mcp.alphavantage.co/mcp?apikey=YOUR_API_KEY&#8221;
    }
  }
}</code></pre><p>Then, you can ask your agent something like:</p><pre><code>Use Alpha Vantage MCP to get NVDA daily price data for the past month,
calculate RSI, and summarize the trend.</code></pre><h4>Best for</h4><p>Developers and teams that want the most balanced MCP server for financial agents: broad market coverage, strong analytical flexibility, straightforward setup, and enough data depth to support both simple market questions and more serious research workflows.</p><div><hr></div><h3>2. Nasdaq Data Link (Best for Research-Grade Financial Datasets)</h3><p><a href="https://data.nasdaq.com/institutional-investors">Nasdaq Data Link</a> is a strong choice for teams that want an MCP-based financial agent to work with deeper, more research-oriented datasets, rather than focusing only on everyday price lookups. It is more compelling when the workflow depends on structured historical datasets, specialized financial information, and a richer research context.</p><p>This makes it especially useful for financial agents designed to support investment research, economic analysis, strategy exploration, or data-heavy market investigations. Instead of limiting the agent to asking, &#8220;What happened to this ticker today?&#8221;, Nasdaq Data Link is better suited to questions that require more structured context, such as comparing market behavior across time, analyzing broader economic relationships, or working with datasets that go beyond standard price and indicator endpoints.</p><p>The trade-off is that Nasdaq Data Link is not as immediately straightforward for general-purpose MCP stock workflows as Alpha Vantage. Its value is highest when the team already knows the type of dataset it wants to expose to the agent and is building around more deliberate research use cases. For lighter stock analysis, Alpha Vantage will often be faster to adopt. For dataset-centered financial intelligence, Nasdaq Data Link becomes much more attractive.</p><h4>Quickstart Example</h4><p>A fitting example prompt would be:</p><pre><code>Use the Nasdaq Data Link MCP integration to retrieve a relevant historical
financial dataset, summarize the major trend changes over time, and highlight
periods that deserve closer investigation.</code></pre><h4>Best for</h4><p>Developers, analysts, and research teams that want MCP-based financial agents to work with deeper, dataset-oriented market analysis rather than only quotes, indicators, or lightweight stock lookups.</p><div><hr></div><h3>3. Tiingo (Best for Market Data and News-Aware Analysis)</h3><p><a href="https://www.tiingo.com/">Tiingo</a> is a strong option for developers seeking an MCP-based financial workflow that feels more like a research assistant than a simple market quote tool, where it fits well for agents that need to connect stock market data with richer context on what is happening in the market.</p><p>This is especially useful because many financial questions can&#8217;t be answered by price data alone. A stock can move sharply due to earnings, guidance, sector sentiment, or broader news. In those cases, an AI agent is more helpful when working with structured market data to interpret what may drive the movement, rather than just reporting price changes. Research highlights the importance of combining numerical data with textual information, such as news, to create better market intelligence systems.</p><p>Tiingo ranks well for market analysis workflows that require both data retrieval and narrative context. An MCP-connected agent using Tiingo can focus on questions like recent stock movements, company summaries, historical trend comparisons, or creating comprehensive ticker briefings, rather than viewing the market only as a numerical dataset.</p><p>Tiingo&#8217;s main strength isn&#8217;t covering all financial use cases but providing a useful middle ground: more contextual than basic market-data APIs and more approachable than institutional platforms. It&#8217;s ideal for teams building MCP workflows, helping agents move from understanding &#8220;what happened?&#8221; to analyzing &#8220;what happened and what context to consider.&#8221;</p><h4>Quickstart Example</h4><p>A representative prompt can still show the use case clearly:</p><pre><code>Use the Tiingo MCP integration to review TSLA&#8217;s recent price movement,
surface any relevant market or company context, and summarize the key
points in a short investor-style briefing.</code></pre><h4>Best for</h4><p>Developers and small teams building financial agents that need to combine stock market data retrieval with richer interpretive context, especially for market briefings, watchlist assistants, and news-aware stock analysis workflows.</p><div><hr></div><h3>4. Intrinio (Best for Fundamentals-Driven Financial Agents)</h3><p><a href="https://intrinio.com/">Intrinio</a> is best suited for developers building MCP-based financial agents that delve deeper into company fundamentals, financial statements, valuation context, and business performance analysis. It is better positioned for agents who need to explain a company rather than merely describe its stock movement.</p><p>This matters because many useful financial workflows start with questions not centered on price. Investors, analysts, or business users want to know revenue growth, margin changes, debt levels, or company comparison on profitability and valuation. An MCP-connected agent is more valuable when it can retrieve structured company data and craft a clear financial narrative.</p><p>Intrinio&#8217;s MCP article offers a professional, fundamentals-based ranking, not just market prices. It&#8217;s valuable for AI workflows that support deliberate analysis, such as comparing companies, reviewing business quality, identifying financial strengths, and preparing structured research similar to early analyst work.</p><p>This makes Intrinio especially appealing for teams that want to build agents for equity research, corporate analysis, and data-backed investment workflows. The trade-off is that it may be more than necessary for a lightweight ticker-monitoring agent. But when the job is to reason about companies with more financial depth, Intrinio deserves its place high in the ranking.</p><h4>Quickstart Example</h4><p>A representative prompt could be:</p><pre><code>Use the Intrinio MCP integration to analyze MSFT&#8217;s latest financial performance,
summarize revenue growth, operating margin, cash flow trends, and key balance
sheet observations, then explain what stands out for an equity research brief.</code></pre><h4>Best for</h4><p>Developers and teams building financial agents for company fundamentals, equity research, earnings analysis, valuation support, and more structured business-performance summaries.</p><div><hr></div><h3>5. FactSet (Best for Enterprise Financial Intelligence)</h3><p><a href="https://www.factset.com/">FactSet</a> is the top choice for organizations wanting enterprise-grade financial agents based on institutional data, research workflows, and decision-support tools. Unlike other providers, it focuses on powering AI in serious financial environments like investment banking, wealth management, and institutional research.</p><p>FactSet also stands out because of how naturally it fits the broader movement toward AI agents embedded in financial workflows. In February 2026, Anthropic announced new enterprise AI plug-ins developed with partners including FactSet, aimed at work in investment banking, wealth management, and other professional domains. That does not make FactSet a lightweight public MCP server in the same way Alpha Vantage is positioned, but it does strengthen its place in this article as the premium option for teams thinking about agentic finance at an institutional level.</p><p>For an MCP-based stock market article, FactSet works best as the <strong>enterprise intelligence</strong> pick: the provider for teams that care about combining financial data access with mature workflow context, internal research processes, and high-stakes analysis. An agent connected to FactSet-style capabilities could support more advanced tasks such as summarizing company developments, reviewing investment opportunities, generating analyst-style briefs, or helping build pitch materials that rely on trusted financial information rather than generic web retrieval.</p><p>The trade-off is that FactSet is not the most approachable choice for individual developers or smaller prototypes. Its value becomes clearest when the application is part of a broader professional research stack, and the user needs institutional depth, workflow integration, and enterprise credibility. For those use cases, FactSet deserves the final place in this ranking&#8202;&#8212;&#8202;not because it is the easiest option, but because it is one of the most powerful when the objective is serious financial intelligence.</p><h4>Quickstart Example</h4><p>Because FactSet&#8217;s public AI positioning is more <strong>enterprise- and partner-led</strong> than self-serve, I would use an example prompt rather than a generic public MCP connection snippet.</p><pre><code>Use the FactSet-connected financial agent to review a target company,
summarize recent business developments, highlight major financial risks,
and prepare an executive-style research brief for an investment discussion.</code></pre><h4>Best for</h4><p>Enterprise teams, investment professionals, and financial institutions that want AI agents built around institutional-grade research workflows, professional market intelligence, and high-value financial analysis.</p><div><hr></div><h3>Conclusion</h3><p>The best MCP server for stock market data depends on the kind of financial agent you want to build:</p><ul><li><p><strong>Alpha Vantage: </strong>Best overall for broad market coverage and flexible agent workflows</p></li><li><p><strong>Nasdaq Data Link: </strong>Best for research-grade financial datasets</p></li><li><p><strong>Tiingo: </strong>Best for market data with stronger context for briefings and analysis</p></li><li><p><strong>Intrinio: </strong>Best for fundamentals-driven company research</p></li><li><p><strong>FactSet: </strong>Best for enterprise financial intelligence</p></li></ul><p>That&#8217;s all for now. I hope it helps!</p>]]></content:encoded></item><item><title><![CDATA[The GenAI Skill Data Professionals Need Most: Evaluation]]></title><description><![CDATA[Only by evaluating things that we know our system is good.]]></description><link>https://www.nb-data.com/p/the-genai-skill-data-professionals</link><guid isPermaLink="false">https://www.nb-data.com/p/the-genai-skill-data-professionals</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Thu, 14 May 2026 17:01:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ugB7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ugB7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ugB7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png 424w, https://substackcdn.com/image/fetch/$s_!ugB7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png 848w, https://substackcdn.com/image/fetch/$s_!ugB7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png 1272w, https://substackcdn.com/image/fetch/$s_!ugB7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ugB7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png" width="1020" height="551" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:551,&quot;width&quot;:1020,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:986860,&quot;alt&quot;:&quot;The GenAI Skill Data Professionals Need Most: Evaluation&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/197661414?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The GenAI Skill Data Professionals Need Most: Evaluation" title="The GenAI Skill Data Professionals Need Most: Evaluation" srcset="https://substackcdn.com/image/fetch/$s_!ugB7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png 424w, https://substackcdn.com/image/fetch/$s_!ugB7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png 848w, https://substackcdn.com/image/fetch/$s_!ugB7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png 1272w, https://substackcdn.com/image/fetch/$s_!ugB7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60e5782f-b33b-410e-be3c-78fc501b9eb3_1020x551.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>GenAI has made output generation cheap.</p><p>A decent prompt can already produce summaries, classifications, SQL explanations, insight drafts, documentation, and stakeholder-facing text.</p><p>The harder part is deciding:</p><ul><li><p>Is the output correct?</p></li><li><p>Is it grounded in the input?</p></li><li><p>Is it consistent across realistic cases?</p></li><li><p>Would I trust it inside an actual workflow?</p></li></ul><p>That is the opening for data professionals. Evaluation turns GenAI from a demo into something that can be used responsibly in real work.</p><p><a href="https://developers.openai.com/api/docs/guides/evaluation-best-practices">OpenAI&#8217;s current eval guidance</a> makes the same distinction: the useful kind of eval is not a public benchmark. It is a task-specific test for the application you are building.</p><p>By the end of this article, you should have a clearer view of what evaluation means in practice, what to test for, and why data professionals are well-positioned to own this skill. </p><p>Curious about it? Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><h1>GenAI output is easy. Trusting it is not.</h1><p>Prompting gets attention because it is visible.</p><p>Evaluation matters more because it decides whether the work holds up.</p><p>Most professionals first experience GenAI as a productivity tool: ask, receive, refine. That makes it tempting to treat a fluent output as a good output.</p><p>But GenAI systems are variable.</p><p>The same system can behave differently across prompts, data slices, and edge cases.</p><p>In professional settings, the cost of a wrong answer is higher than the cost of a slightly worse prompt.</p><p>That is why evaluation matters.</p><div><hr></div><h1>What does evaluation actually mean in real work?</h1><p>Evaluation here does not mean comparing GPT-5.5 against Claude on a benchmark.</p><p>It means testing a GenAI workflow against the task&#8217;s standards.</p><p>The standard is to separate broad model benchmarks from specific evaluations you design for your own application. Google&#8217;s evaluation documentation also frames this as a test-driven process: define the task, prepare evaluation data, choose the quality criteria, and inspect results.</p><p>Consider workflows that data professionals might evaluate:</p><ul><li><p>A GenAI assistant that summarizes KPI movements</p></li><li><p>A RAG system answering questions from internal policy documents</p></li><li><p>A classifier that tags customer feedback or support tickets</p></li><li><p>A model that explains SQL output to business users</p></li></ul><p>These are not abstract research problems. They are real tasks that need real testing.</p><div><hr></div><h1>The four questions every GenAI workflow should answer</h1><p>The heart of evaluation is not a long taxonomy of metrics, as it comes down to these four questions:</p><h3><strong>A. Did it complete the task?</strong></h3><p>This means that if the GenAI application can perform the task it was asked to do, such as classifying the ticket into the allowed categories or summarizing the table instead of restating it.</p><p>In technical terms, this is often measured through deterministic checks: regex matching for expected formats, JSON schema validation, or exact-match accuracy against a predefined label set.</p><div class="callout-block" data-callout="true"><p style="text-align: center;"><code>Exact Match = 1 if output matches ground truth, else 0</code></p></div><p>This is the simplest evaluation dimension, but also the most often skipped.</p><h3><strong>B. Is it correct and grounded?</strong></h3><p>Did it stay faithful to the source data, retrieved context, or provided evidence?</p><p>This is especially important for RAG systems, analytical summaries, and policy assistants.</p><p>Basic text overlap metrics such as ROUGE or BLEU are not sufficient here. They check surface-level word similarity but miss meaning.</p><p>A stronger approach is to compare the output and source in the embedding space. The standard metric for that is cosine similarity:</p><div class="callout-block" data-callout="true"><p style="text-align: center;"><code>Cosine Similarity(A, B) = (A &#183; B) / (||A|| &#215; ||B||)</code></p></div><p>A high cosine similarity means the generated text is semantically close to the source. A low score may indicate the model drifted or fabricated information.</p><p>For RAG systems specifically, you also need to check whether the retrieval step actually found the right documents before the model started generating. That is where Recall@K matters:</p><div class="callout-block" data-callout="true"><p style="text-align: center;"><code>Recall@K = |Relevant Documents &#8745; Top-K Retrieved| / |Relevant Documents|</code></p></div><p>If the retriever misses the relevant source, even a perfect generator will produce the wrong answer.</p><p>These are examples of some metrics from the Generative AI metrics.</p><h3><strong>C. Is the quality consistent across realistic cases?</strong></h3><p>A single good example proves very little.</p><p>The system should be tested against various cases, such as:</p><ul><li><p>Easy cases</p></li><li><p>Ambiguous cases</p></li><li><p>Incomplete inputs</p></li><li><p>Edge cases</p></li><li><p>Cases where the correct response is &#8220;not enough information.&#8221;</p></li></ul><p>You already know not to trust a model based on a single clean validation sample.</p><p>GenAI deserves the same discipline. That means building automated test suites that run every time the prompt, model version, or pipeline changes.</p><h3><strong>D. How does it fail?</strong></h3><p>This is the most important professional instinct.</p><p>A workflow that fails clearly is easier to manage than one that sounds confident while being wrong.</p><p>A good evaluation should reveal things such as:</p><ul><li><p>Unsupported claims</p></li><li><p>Fabricated numbers</p></li><li><p>Hidden assumptions</p></li><li><p>Formatting failures</p></li><li><p>Misleading simplifications</p></li></ul><p>This is where logging and tracing tools become important. Capturing inputs and outputs systematically lets you identify failure patterns instead of guessing.</p><p>The job is not to prove that the system works. The job is to learn where it fails before someone depends on it.</p><div><hr></div><h1>An example: evaluating a KPI commentary assistant</h1><p>To make this concrete, consider a workflow that many data teams will recognize.</p><p>A GenAI assistant receives a monthly KPI table and writes a short business commentary:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MtHV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MtHV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png 424w, https://substackcdn.com/image/fetch/$s_!MtHV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png 848w, https://substackcdn.com/image/fetch/$s_!MtHV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png 1272w, https://substackcdn.com/image/fetch/$s_!MtHV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MtHV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png" width="1170" height="119" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:119,&quot;width&quot;:1170,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14418,&quot;alt&quot;:&quot;The GenAI Skill Data Professionals Need Most: Evaluation&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/197661414?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The GenAI Skill Data Professionals Need Most: Evaluation" title="The GenAI Skill Data Professionals Need Most: Evaluation" srcset="https://substackcdn.com/image/fetch/$s_!MtHV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png 424w, https://substackcdn.com/image/fetch/$s_!MtHV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png 848w, https://substackcdn.com/image/fetch/$s_!MtHV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png 1272w, https://substackcdn.com/image/fetch/$s_!MtHV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3054c7d0-b4d0-4d29-b704-bf6fb227f8c1_1170x119.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The expected output should:</p><ul><li><p>Identify that revenue declined</p></li><li><p>Mention lower acquisition and higher churn</p></li><li><p>Avoid claiming causality unless supported</p></li><li><p>Keep the commentary concise and business-appropriate</p></li></ul><p>Then you evaluate:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3sBg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3sBg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png 424w, https://substackcdn.com/image/fetch/$s_!3sBg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png 848w, https://substackcdn.com/image/fetch/$s_!3sBg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png 1272w, https://substackcdn.com/image/fetch/$s_!3sBg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3sBg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png" width="1176" height="307" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bae7e556-8890-492f-9331-64f99be0ea98_1176x307.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:307,&quot;width&quot;:1176,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41497,&quot;alt&quot;:&quot;The GenAI Skill Data Professionals Need Most: Evaluation&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/197661414?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The GenAI Skill Data Professionals Need Most: Evaluation" title="The GenAI Skill Data Professionals Need Most: Evaluation" srcset="https://substackcdn.com/image/fetch/$s_!3sBg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png 424w, https://substackcdn.com/image/fetch/$s_!3sBg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png 848w, https://substackcdn.com/image/fetch/$s_!3sBg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png 1272w, https://substackcdn.com/image/fetch/$s_!3sBg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbae7e556-8890-492f-9331-64f99be0ea98_1176x307.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This gives you something concrete to recognize in your own work. It also separates your evaluation from a generic evaluation.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/the-genai-skill-data-professionals?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/the-genai-skill-data-professionals?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h1>A simple evaluation loop for data professionals</h1><p>You do not need a full evaluation framework to start.</p><p>A five-step loop is enough:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OqZC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OqZC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png 424w, https://substackcdn.com/image/fetch/$s_!OqZC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png 848w, https://substackcdn.com/image/fetch/$s_!OqZC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png 1272w, https://substackcdn.com/image/fetch/$s_!OqZC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OqZC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png" width="1456" height="139" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:139,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26854,&quot;alt&quot;:&quot;The GenAI Skill Data Professionals Need Most: Evaluation&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/197661414?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The GenAI Skill Data Professionals Need Most: Evaluation" title="The GenAI Skill Data Professionals Need Most: Evaluation" srcset="https://substackcdn.com/image/fetch/$s_!OqZC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png 424w, https://substackcdn.com/image/fetch/$s_!OqZC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png 848w, https://substackcdn.com/image/fetch/$s_!OqZC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png 1272w, https://substackcdn.com/image/fetch/$s_!OqZC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cea89b1-53cb-4a51-ad30-91847d7e79eb_2001x191.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>1. Define the job clearly.</strong> What should the GenAI system actually do?</p><p><strong>2. Collect representative test cases.</strong> Not just the clean ones. Include strange, messy, and borderline cases.</p><p><strong>3. Write a simple scoring rubric.</strong> For structured outputs, use exact-match checks. For subjective quality, use a binary (0/1) rubric or a short LLM-as-a-judge prompt.</p><p><strong>4. Compare outputs across prompts, models, or versions against the same cases.</strong> Eval guidance emphasizes using evals to test and iterate rather than relying on ad hoc inspection.</p><p><strong>5. Record recurring failures.</strong> The failure pattern matters more than one overall score.</p><p>The evaluation stack reflects this general pattern as well: evaluation datasets, rubric-based measures, deterministic metrics where suitable, and custom checks for task-specific requirements.</p><div><hr></div><h1>Why data professionals are well-positioned to own this</h1><p>Evaluation is not strange to data professionals.</p><p>It draws on habits we already use:</p><ul><li><p>Defining success criteria</p></li><li><p>Building representative samples</p></li><li><p>Distinguishing anecdotes from evidence</p></li><li><p>Checking edge cases</p></li><li><p>Analyzing errors instead of celebrating one good result</p></li><li><p>Deciding whether an output is fit for use</p></li></ul><p>Data professionals already have much of the mindset evaluation requires.</p><p>The shift is applying that discipline to generative systems, not only predictive models.</p><p>That is why evaluation may become one of the most valuable GenAI-adjacent skills for analysts, data scientists, analytics engineers, and ML practitioners.</p><div><hr></div><h1>Conclusion</h1><p>Evaluation is what turns GenAI from something impressive into something dependable. For data professionals, that matters more than learning a clever prompt pattern or chasing the newest model release. </p><p>The real advantage comes from knowing how to define good output, test it against realistic cases, trace where it fails, and decide whether it is ready to support actual work. </p><p>That habit is already close to how strong data professionals think: establish the standard, examine the evidence, and avoid trusting a result just because it looks convincing. GenAI simply gives that discipline a new place to matter. As these systems move deeper into analysis, reporting, search, and decision support, the professionals who can evaluate them well will be the ones who make them genuinely useful.</p>]]></content:encoded></item><item><title><![CDATA[The ML Skills That Still Matter in 2026]]></title><description><![CDATA[The skills that still make you desirable by companies and ahead of everyone]]></description><link>https://www.nb-data.com/p/the-ml-skills-that-still-matter-in</link><guid isPermaLink="false">https://www.nb-data.com/p/the-ml-skills-that-still-matter-in</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Thu, 30 Apr 2026 02:01:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xNHD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xNHD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xNHD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!xNHD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!xNHD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!xNHD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xNHD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:855757,&quot;alt&quot;:&quot;The ML Skills That Still Matter in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/195882313?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The ML Skills That Still Matter in 2026" title="The ML Skills That Still Matter in 2026" srcset="https://substackcdn.com/image/fetch/$s_!xNHD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!xNHD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!xNHD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!xNHD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7481d4bb-f363-449a-bacb-b3f38b585a86_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A lot of data professionals are trying to decide what to learn next.</p><p>That is understandable.</p><p>The field feels crowded now. There is classical machine learning. Deep learning. Generative AI. RAG. Agents. Evaluation. MLOps. Analytics engineering. Data products. Every few months, a new tool or workflow becomes the thing everyone talks about.</p><p>So the question becomes narrow:</p><p><em>Which machine learning skills are still worth learning in 2026?</em></p><p>The answer is not &#8220;learn every model.&#8221;<br>It is also not &#8220;ML is dead because GenAI can do everything.&#8221;</p><p>A better answer is this:</p><p><strong>The ML skills that still matter are the skills that help you turn data into reliable decisions.</strong></p><p>That includes framing the problem, checking the data, building baselines, choosing metrics, validating properly, analyzing errors, and monitoring what happens after deployment.</p><p>These skills matter because AI adoption is no longer experimental. Research shows that generative AI adoption has moved quickly into the mainstream, with reported population adoption reaching 53% within three years. But adoption does not automatically mean reliability. The more AI systems enter daily workflows, the more important it becomes to know whether the output can be trusted.</p><p>That is where machine learning fundamentals still matter.</p><p>By the end of this article, you should have a clearer answer to three questions:</p><ul><li><p>Which ML skills are still worth learning?</p></li><li><p>Why do they still matter in a GenAI-heavy world?</p></li><li><p>What should you actually practice if you want to stay useful as a data professional?</p></li></ul><p>Curious about it? Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>1. Turn vague business problems into ML problems</h1><p>The first ML skill that still matters is problem framing.</p><p>Most weak ML projects do not fail because the model was not advanced enough. They fail because the problem was unclear from the beginning.</p><p>Someone says:<br>&#8220;Can we predict churn?&#8221;</p><p>That sounds like a machine learning problem. But it is not specific enough yet. Before modeling, you need to define, for example:</p><p><strong>What is the prediction target?</strong> Churn within the next 30 days</p><p><strong>What is the prediction unit?</strong>  One customer account</p><p><strong>When is the prediction made?</strong> Every Monday morning</p><p><strong>What data is available at that time?</strong> Activity, billing, support, and product usage up to Sunday night</p><p><strong>What action will follow the prediction?</strong> Retention team contacts high-risk accounts</p><p><strong>What does success mean?</strong> Lower churn rate, higher retained revenue, better prioritization</p><p>Without these definitions, the model may still produce a score. But the score may not be useful.</p><p>This is the first practical lesson: A machine learning problem is not defined by the model you use. It is defined by the decision you want to improve.</p><p>A good ML practitioner knows how to translate a messy business request into something testable. That means asking whether we are predicting, ranking, classifying, detecting, recommending, or forecasting. You need to identify the exact target variable, the relevant time window, the information available at prediction time, the end-user of the result, and exactly what they will do differently because of the model.</p><p>This skill still matters in 2026 because GenAI tools can help you write code faster, but they do not automatically define the right problem for you. If the target is wrong, the rest of the pipeline is just a waste.</p><div><hr></div><h1>2. Audit the data before trusting the model</h1><p>The second skill is data auditing. This is more than &#8220;clean the data.&#8221;</p><p>Cleaning data usually means handling missing values, fixing formats, removing duplicates, and standardizing columns. Data auditing goes deeper. It asks whether the data is actually valid for the problem. For an ML project, you need to inspect at least five things:</p><p><strong>Label quality: </strong>Is the target variable correct and consistently defined?</p><p><strong>Data availability: </strong>Would these features be available when the prediction is made?</p><p><strong>Leakage: </strong>Does the dataset contain future information?</p><p><strong>Sampling bias: </strong>Does the training data represent the population where the model will be used?</p><p><strong>Stability: </strong>Are the patterns likely to hold over time?</p><p>Data leakage is especially important. Scikit-learn&#8217;s documentation warns that leakage can produce overly optimistic Performance because information from the test set or future data accidentally enters the training process.</p><p>A simple example:</p><p>You are building a churn model. Your dataset includes a column called cancellation_reason. The model performs extremely well.</p><p>But there is a problem. That column only exists after the customer has already churned. The model is not learning early churn signals. It is reading the answer key.</p><p>This happens more often than people admit. It can appear in many forms:</p><p>For example, in <strong>Credit risk</strong>, it uses a manual review result that happens after application submission or in <strong>Fraud detection</strong>, it uses an investigation outcome as an input feature.</p><p>This is why data auditing remains a core ML skill. A data professional should be able to look at a dataset and ask: &#8220;Would I know this information at the moment I need to make the prediction?&#8221; If the answer is no, the feature probably should not be there.</p><div><hr></div><h1>3. Build simple baselines before complex models</h1><p>The third skill is baseline building. A baseline is the simplest model or rule that gives you a reference point. It could be a majority-class classifier, a simple business rule, logistic regression, a decision tree, a moving average, a keyword-based classifier, or a simple ranking score.</p><p>Baselines are not boring. They are protection against unnecessary complexity.</p><p>Google&#8217;s Rules of Machine Learning still emphasize starting with simple models and robust infrastructure before moving into more complex ML systems. That advice has aged well because the easiest mistake in modern AI work is to overbuild too early.</p><p>Suppose you are building a lead scoring model. A simple baseline might be: Score leads higher if they visited the pricing page, opened two emails, and work at a company above a certain size.</p><p>Then you compare that with a logistic regression model. Then maybe a gradient boosting model. Only after that should you consider something more complex.</p><p>The practical question is not: &#8220;Can I use a more advanced model?&#8221;</p><p>The better question is: &#8220;Does the more advanced model produce enough improvement to justify the added complexity?&#8221;</p><p>That complexity includes harder debugging, higher maintenance costs, more difficult explanations, more monitoring requirements, more fragile deployments, and more stakeholder confusion.</p><p>In many real business settings, the best model is not the most advanced model. It is the simplest model that performs well enough and can be trusted by the people who need to use it.</p><div><hr></div><h1>4. Choose metrics based on the cost of mistakes</h1><p>The fourth skill is metric selection. This is where many ML projects become misleading. People often ask: &#8220;What is the model accuracy?&#8221;</p><p>But accuracy is not always the right metric. For example, imagine a fraud detection dataset where only 1% of transactions are fraudulent. A model that predicts &#8220;not fraud&#8221; for everything can be 99% accurate.</p><p>That sounds excellent. It is also useless.</p><p>Metric choice depends on the type of mistake you care about.</p><p>For instance, <strong>in Fraud detection</strong>, you care about recall, precision, and false positive rate. In Medical screening, recall and false negatives are critical. In Spam detection, it is precision and false positives. <strong>In Credit risk</strong>, you look at calibration, precision, recall, and expected loss. <strong>In Recommendation systems</strong>, ranking metrics, conversion, and retention matter most.</p><p>This is one of the most useful ML skills for real work: You need to connect the metric to the decision.</p><p>If the business can only contact 500 customers per week, then overall accuracy may not matter much. What matters is whether the top 500 predicted customers are actually worth contacting.</p><p>In that case, you may care about precision at K, lift in the top decile, expected retained revenue, conversion from intervention, or the cost per saved customer.</p><p>The right metric depends on the operating reality. This is why ML is not only a technical exercise. A model is evaluated inside a business process. If the metric ignores that process, the evaluation is incomplete.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/the-ml-skills-that-still-matter-in?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/the-ml-skills-that-still-matter-in?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h1>5. Validate without fooling yourself</h1><p>The fifth skill is validation. This is where you test whether the model can generalize.</p><p>A model can perform well on the data it has seen. That does not mean it will perform well on new data. This is why train/test splits, cross-validation, time-based validation, and holdout sets still matter.</p><p>Scikit-learn&#8217;s cross-validation documentation highlights the importance of evaluating estimator performance properly and avoiding pitfalls that make performance estimates unreliable.</p><p>The important idea is simple: Your validation setup should imitate the real situation where the model will be used.</p><p>For random customer classification, a random train/test split might be fine. For time-dependent problems, it may not be. If you are forecasting demand, predicting churn, detecting fraud, or scoring leads over time, you often need time-based validation. That means training on the past and testing on the future.</p><p>For example, you might train on data from January to March to validate in April. Then train from January to April to validate in May, and train from January to May to validate in June.</p><p>This is closer to the real world. You do not get to train in June to predict April. Validation should respect time.</p><p>A strong data professional knows how to ask:</p><ul><li><p>Was the test set kept separate?</p></li><li><p>Was the preprocessing fitted only on the training data?</p></li><li><p>Was the split random when it should have been time-based?</p></li><li><p>Were duplicate users or records split across train and test?</p></li><li><p>Was the test set reused too many times during model selection?</p></li><li><p>Does Performance hold across segments?</p></li></ul><p>This skill matters because bad validation creates false confidence. And false confidence is dangerous.</p><div><hr></div><h1>6. Analyze errors, not just scores</h1><p>The sixth skill is error analysis. A single score is not enough.</p><p>A model with 88% accuracy may still fail badly for the most important cases. A forecasting model with a good average error may still miss peak demand. A churn model may perform well overall but fail for enterprise customers. A document classifier may work for clean English documents but fail for short, messy, multilingual text.</p><p>This is why error analysis matters. After you evaluate the model, you should inspect where it fails.</p><p>A practical error analysis table might look like this:</p><p><strong>New users</strong>: Does the model fail when history is limited?</p><p><strong>High-value customers</strong>: Does it work for the accounts that matter most?</p><p><strong>Geography</strong>: Does Performance differ by country or region?</p><p><strong>Product category</strong>: Does it fail on long-tail categories?</p><p><strong>Time period</strong>: Does it degrade during holidays or campaigns?</p><p><strong>Input quality</strong>: Does messy or incomplete data hurt Performance?</p><p><strong>Minority class</strong>: Does the model ignore rare but important cases?</p><p>This is where ML work becomes diagnostic. You stop asking only: &#8220;Is the model good?&#8221; You start asking: &#8220;Where is the model useful, where is it weak, and where should we not trust it?&#8221;</p><p>That is a much better standard. This skill also connects directly to GenAI evaluation. When evaluating an LLM workflow, the same habit applies. You do not only ask whether the average output looks good. You ask which prompts fail, which user intents fail, which document types fail, which languages fail, which edge cases create hallucination, and which outputs need human review.</p><p>The models changed, but the evaluation habit still transfers.</p><div><hr></div><h1>7. Understand Monitoring after deployment</h1><p>The seventh skill is model monitoring. A model is not finished when it is deployed. It enters a changing environment.</p><p>Customer behavior changes. Product features change. Fraud patterns change. Marketing channels change. Economic conditions change. Data pipelines change. Even column definitions can change.</p><p>Google Cloud&#8217;s model monitoring documentation discusses feature skew and drift detection for deployed models, including changes in categorical and numerical input features.</p><p>For most data professionals, the key idea is not complicated: A model can become worse even if the code does not change.</p><p>That means you need to monitor:</p><p><strong>Data freshness: </strong>Did the latest batch arrive?</p><p><strong>Input distribution: </strong>Are feature values changing?</p><p><strong>Prediction distribution: </strong>Are model scores suddenly higher or lower?</p><p><strong>Segment performance: </strong>Is one customer group degrading faster?</p><p><strong>Business outcome: </strong>Is the model still improving the intended metric?</p><p><strong>Pipeline health: </strong>Are transformations still working correctly?</p><p><strong>Human feedback: </strong>Are users overriding or ignoring the model?</p><p>This is one reason ML skills remain useful in 2026. Many AI systems fail quietly. They do not always break in obvious ways. They drift. They degrade. They become misaligned with the current workflow. Monitoring is how you notice before the damage becomes expensive.</p><div><hr></div><h1>8. Explain the model in business terms</h1><p>The eighth skill is communication. But not generic communication. The useful skill is being able to explain the model in terms of decisions, trade-offs, and risk.</p><p>A stakeholder does not only need to know that the model has 0.82 AUC. They need to know what that means.</p><p>For example: &#8220;The model is useful for ranking customers by churn risk, but it should not be used as an automatic cancellation prediction. It works better for customers with at least three months of activity history. For new customers, the signal is weaker.&#8221;</p><p>That explanation is much more useful than a metric alone. A good ML explanation should include:</p><p>A good ML explanation should include the Purpose (what decision the model supports), the Scope (where it should and should not be used), the Performance (how well it works and compared to what), Failure modes (where it performs poorly), Trade-offs (what happens if we optimize for precision vs recall), Action (what users should do with the output), and Monitoring (what needs to be checked after deployment).</p><p>This is especially important as AI systems become more embedded in business workflows. NIST&#8217;s AI Risk Management Framework emphasizes test, evaluation, verification, and validation across the AI lifecycle. That kind of thinking is not only for regulators or governance teams. It is also practical for data professionals who need to explain when a model is reliable enough to use.</p><p>The best ML people are not only model builders. They are translators between model behavior and business action.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/the-ml-skills-that-still-matter-in/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/the-ml-skills-that-still-matter-in/comments"><span>Leave a comment</span></a></p><div><hr></div><h1>So what should you learn in 2026?</h1><p>If you are learning ML now, do not structure your learning around a long list of algorithms. Structure it around the workflow. Here is a better learning path.</p><p><strong>Problem framing</strong><em> | Prediction target, decision point, action, success metric | Practice: </em>A one-page ML problem definition</p><p><strong>Data audit</strong><em> | Leakage, label quality, missingness, sampling, availability | Practice: </em>A data quality checklist</p><p><strong>Baseline modeling</strong><em> | Rules, logistic regression, trees, simple benchmarks | Practice: </em>A baseline comparison table</p><p><strong>Metric selection</strong><em> | Precision, recall, F1, AUC, calibration, ranking metrics | Practice: </em>A metric justification note</p><p><strong>Validation</strong><em> | Train/test split, cross-validation, time-based split | Practice: </em>A validation design</p><p><strong>Error analysis</strong><em> | Segment performance, false positives, false negatives | Practice: </em>An error analysis report</p><p><strong>Monitoring</strong><em> | Drift, skew, data freshness, prediction distribution | Practice: </em>A monitoring checklist</p><p><strong>Communication</strong><em> | Trade-offs, limitations, recommended use | Practice: </em>A stakeholder summary</p><p>This is the part many courses skip. They teach the model before the workflow. </p><p>But in real work, the workflow is what makes the model useful.</p><h1>The skills that still matter</h1><p>So, what ML skills still matter in 2026? These ones:</p><ul><li><p>Framing a vague business problem into a clear ML task</p></li><li><p>Auditing data before trusting it</p></li><li><p>Detecting leakage and bad labels</p></li><li><p>Building simple baselines</p></li><li><p>Choosing metrics based on business cost</p></li><li><p>Validating models correctly</p></li><li><p>Analyzing errors by segment</p></li><li><p>Monitoring model behavior after deployment</p></li><li><p>Explaining limitations clearly</p></li></ul><p>These skills are not trendy. But they are durable. They matter for classical ML. They matter for GenAI evaluation. They matter for RAG systems. They matter for AI products.</p><p>They matter whenever someone asks: &#8220;Can we trust this output enough to use it?&#8221;</p><p>That is the real reason ML still matters. Not because every data professional needs to train models from scratch. Not because classical ML is competing with GenAI. But because ML teaches the discipline behind reliable AI work.</p><p>In 2026, the most valuable data professionals will not be the ones who chase every new tool. They will be the ones who can build, evaluate, question, and explain AI systems clearly.</p><p>That is still machine learning.</p><p>And it still matters.</p><div><hr></div>]]></content:encoded></item><item><title><![CDATA[What Real SQL Work Taught Me About Being a Data Scientist]]></title><description><![CDATA[Why I stopped seeing SQL as a secondary skill and started seeing it as the backbone of real data projects]]></description><link>https://www.nb-data.com/p/what-real-sql-work-taught-me-about</link><guid isPermaLink="false">https://www.nb-data.com/p/what-real-sql-work-taught-me-about</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sat, 28 Mar 2026 15:07:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7CmX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7CmX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7CmX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7CmX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7CmX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7CmX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7CmX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg" width="1312" height="736" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:736,&quot;width&quot;:1312,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90217,&quot;alt&quot;:&quot;What Real SQL Work Taught Me About Being a Data Scientist&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/192418111?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="What Real SQL Work Taught Me About Being a Data Scientist" title="What Real SQL Work Taught Me About Being a Data Scientist" srcset="https://substackcdn.com/image/fetch/$s_!7CmX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7CmX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7CmX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7CmX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4abfb659-bac8-4b53-b22e-24fcad7b32ea_1312x736.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by Ideogram.ai</figcaption></figure></div><div class="pullquote"><p>"Real SQL work taught me that trustworthy definitions matter more than flashy queries."</p></div><h1>I did not start by taking SQL seriously</h1><p style="text-align: justify;">Early in my career, I did not see SQL as central to being a data scientist. Most of my learning was built around Python, and the classes and bootcamps I joined reinforced that view. Python felt like the real language of data science. SQL felt useful, but distant.</p><p>So I did not reject it. I simply did not get enough exposure to it.</p><p>That distinction matters. When your early learning path is dominated by notebooks, models, and Python libraries, it is easy to assume that the real work starts once the data is already in front of you. In that worldview, SQL looks like preparation work. Helpful, yes. Foundational, no.</p><p>Real work changed that view gradually.</p><p>The more I worked in corporate settings, the clearer it became that many projects do not begin with modeling, dashboards, or machine learning. They begin with a more basic set of questions: Is the data available? Is the definition correct? Can the result be trusted enough for someone to act on it?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>Work forced the lesson</h1><p>What changed my view of SQL was not one dramatic moment. It was the accumulation of projects. Again and again, the work pulled me toward the same reality: before anything becomes an analysis, a model, or a recommendation, someone has to make sure the data is available, correctly defined, and usable.</p><p>That is where SQL kept appearing.</p><p>Sometimes the request looked simple. A business team needed a report. Sometimes the request sounded more strategic. A project needed insight to inform a decision. Sometimes the work moved beyond a single analysis into the project's production life. In each case, SQL mattered not only for retrieving data, but for deciding whether the project itself rested on a solid foundation.</p><p>The difficult conversations were often not about syntax at all. They were about meaning. What exactly should count as a sale? Which time window should be used? Which source should be treated as the source of truth? If two tables produce different answers, which one reflects the real business process?</p><p>That was the point where SQL stopped feeling like a supporting skill and became infrastructure.</p><h1>What real SQL work actually looked like</h1><p>The lesson became clearer through a few recurring types of work. These were not glamorous, as they were simply the places where SQL kept proving its value.</p><ul><li><p>Ad-hoc reporting and insight requests that looked simple but hid messy logic and scattered data.</p></li><li><p>Metric definition work, where the challenge was deciding what should count before writing the query.</p></li><li><p>Combining multiple data sources without destroying the business meaning of the result.</p></li><li><p>Preparing the right data for downstream analysis and modeling in Python.</p></li></ul><div><hr></div><h2>1. Ad-hoc reporting taught me that simple requests are rarely simple</h2><p>A lot of real SQL work starts with a seemingly harmless request. The business needs a report. Someone wants a quick performance update. A team asks for insight before a meeting. On paper, it sounds like a straightforward query.</p><p>In practice, it rarely is.</p><p>Sometimes the data is not available in one place. Sometimes it lives across several sources that were never designed to fit together neatly. Sometimes the logic needed to answer the question is more complicated than the request suggests. And often the timeline is short, so you do not have the luxury of slowly wading through the data.</p><p>That changed how I think about SQL skills. In real reporting work, the challenge is not just writing something that runs. The challenge is moving from a vague business question to a reliable answer under real constraints. That takes judgment, prioritization, and a clear sense of what the output needs to mean.</p><p>Useful SQL work is often less glamorous than people expect. It is not always about elegant tricks. Very often, it is about getting the right answer quickly enough to matter, without breaking the logic behind it.</p><div><hr></div><h2>2. Metric definition matters more than query complexity</h2><p>If there is one area where real SQL work changed me the most, it is the definition of metrics.</p><p>In theory, a metric looks clean. In practice, even something as familiar as a sales number can go wrong depending on the time scope, exclusions, business rules, and source tables. A number can look precise and still be misleading if two teams are working from different assumptions or if one table captures the event differently from another.</p><p>That is why some SQL problems cannot be solved by clever syntax alone. You can write a technically correct query and still produce the wrong business answer.</p><p>The real work is often more basic and more demanding at the same time:</p><ul><li><p>deciding what should count</p></li><li><p>deciding what should be excluded</p></li><li><p>choosing which table reflects the operational truth</p></li><li><p>making sure the result matches the way the business actually works</p></li></ul><p>This is where collaboration becomes essential. There are many situations where the data exists, but understanding it requires discussion with business users who know the process behind the records. Without that alignment, a query may return rows but not the truth.</p><p>Over time, I started to see that some of the most dangerous problems in data work are not computational. They are definitional. A wrong definition can quietly damage a project, mislead stakeholders, or erode trust in the team long before anyone notices the issue.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/what-real-sql-work-taught-me-about?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/what-real-sql-work-taught-me-about?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h2>3. Combining data sources is harder than it looks</h2><p>Another lesson real SQL work taught me is that combining information from multiple sources without losing meaning is much harder than it first appears.</p><p>From the outside, joins can look like a purely technical step. In practice, they can become one of the most delicate parts of a project. Sometimes a clean primary key does not exist. Sometimes the relationship is not direct. Sometimes aggregation is needed before two datasets can even be compared. And sometimes each source reflects a slightly different view of the same business concept.</p><p>That creates several risks at once: duplicate rows, dropped records, timing mismatches, and numbers that appear structurally valid but are conceptually incorrect.</p><p>This is why SQL work often requires more collaboration than people expect. To combine sources responsibly, you frequently need validation from multiple stakeholders. The challenge is not merely to make the query run. The challenge is to preserve validity.</p><p>For me, this was one of the clearest moments where SQL became inseparable from business understanding. Good SQL was not just about retrieval. It was about preserving meaning as it moved across systems.</p><div><hr></div><h2>4. Even Python-heavy data science often begins with SQL</h2><p>Because my early learning path emphasized Python, I initially imagined that most serious data-science work would begin there. In reality, SQL was often necessary before I could even start proper work in Python.</p><p>If the data lived in a SQL database, then SQL was the gatekeeper. It was how I extracted the relevant population, selected the appropriate time window, assembled the required columns, and checked whether the data were suitable for the task ahead. Whether the next step was exploratory analysis, feature preparation, modeling, or evaluation, SQL was often the first step.</p><p>That changed how I think about the relationship between SQL and data science. SQL is not simply what happens before the interesting work. Very often, it is part of the interesting work.</p><p>If the population is wrong, the feature set is incomplete, or the definition is unstable, the downstream Python work inherits that weakness. In that sense, SQL does not sit beneath data science. It sits inside it.</p><div><hr></div><h1>What I value in SQL work now</h1><p>Real work also changed how I evaluate SQL skills in others and in myself.</p><p>I still care about writing cleaner, more efficient queries, especially as data grows larger and execution speed matters. But that is no longer the first thing I look for.</p><p>What I value first is this:</p><blockquote><p>1. Correctness. The wrong data can quietly damage an entire project.</p><p>2. Stakeholder trust. Data work only becomes valuable when other people believe the result is dependable.</p><p>3. Maintainability. Many projects do not end after a single request, so someone has to live with the logic later.</p></blockquote><p>A strong SQL practitioner, in my view, is not simply someone who knows a large amount of syntax. It is someone who understands the data definition, knows how to acquire the data in the most reliable way, and can produce logic that remains useful beyond the moment it was written.</p><div><hr></div><h1>What I would tell aspiring data scientists now</h1><p>If your learning path has focused mostly on Python, I would say this clearly: do not treat SQL as optional.</p><p>You do not need to memorize every feature of the language before doing meaningful work. Documentation exists, and syntax can be learned as needed. But you do need to understand why SQL matters. It matters because data projects depend on access to the right data, under the right definitions, with logic that can withstand real business use.</p><p>That is the part I wish I had understood earlier. SQL is not important because it looks technical. It is important because it sits close to the truth conditions of data work. It is where data availability is tested. It is where definitions get challenged. It is where numbers either become trustworthy or fall apart.</p><p>For me, that has become one of the clearest professional lessons of real data work. SQL is not the opposite of data science, nor is it a lower-level skill beneath it. In many organizations, SQL is one of the foundations that allows data science to be useful at all.</p><p>And if there is one line I would leave readers with, it is this: real SQL work taught me that trustworthy definitions matter more than flashy queries.</p><p>If you are learning SQL now, learn it through real use cases. Learn it through reporting, metric definition, source validation, and the kind of business questions that force you to care about correctness. </p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/what-real-sql-work-taught-me-about/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/what-real-sql-work-taught-me-about/comments"><span>Leave a comment</span></a></p>]]></content:encoded></item><item><title><![CDATA[Best Stock Market data API in the AI Agent era]]></title><description><![CDATA[How modern financial APIs are powering the next generation of AI-driven market research and trading tools]]></description><link>https://www.nb-data.com/p/best-stock-market-data-api-in-the</link><guid isPermaLink="false">https://www.nb-data.com/p/best-stock-market-data-api-in-the</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sat, 14 Mar 2026 07:49:27 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4800" height="3188" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3188,&quot;width&quot;:4800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;close-up photo of monitor displaying graph&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="close-up photo of monitor displaying graph" title="close-up photo of monitor displaying graph" srcset="https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1560221328-12fe60f83ab8?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyfHxzdG9ja3xlbnwwfHx8fDE3NzMxNDkwODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@bash__profile">Nicholas Cappello</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>The stock market data API landscape is changing. In the past, developers mostly evaluated providers on familiar dimensions: coverage, latency, pricing, documentation, and reliability. Those criteria still matter, but the rise of LLM-powered copilots, autonomous research workflows, and multi-agent financial systems has introduced a new requirement: how easily can a data provider plug into agentic software?</p><p>In that environment, the strongest providers are not just the ones with broad datasets. They are the ones that expose clean, structured interfaces that AI agents can query, reason over, and combine with downstream tools for analysis, monitoring, and decision support. Some vendors are already leaning into this shift with MCP servers and LLM-oriented resources. Others remain stronger as enterprise data backbones than as explicitly AI-native platforms.</p><p>In this article, we will explore the Best Stock Market data API in the AI Agent era. Curious about it? </p><p>Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1><strong>Alpha Vantage</strong></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xh-Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xh-Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png 424w, https://substackcdn.com/image/fetch/$s_!xh-Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png 848w, https://substackcdn.com/image/fetch/$s_!xh-Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!xh-Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xh-Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png" width="1456" height="752" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:752,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:182857,&quot;alt&quot;:&quot;Best Stock Market data API in the AI Agent era&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/190589167?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Stock Market data API in the AI Agent era" title="Best Stock Market data API in the AI Agent era" srcset="https://substackcdn.com/image/fetch/$s_!xh-Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png 424w, https://substackcdn.com/image/fetch/$s_!xh-Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png 848w, https://substackcdn.com/image/fetch/$s_!xh-Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!xh-Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8db652-ca9d-40e1-be34-c06229ff5dc2_2474x1278.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://www.alphavantage.co/">Alpha Vantage</a> is a widely used financial data platform that provides real-time and historical market data APIs for equities, options, forex, cryptocurrencies, and macroeconomic indicators. The platform is designed to support both individual developers and professional trading systems through a simple, developer-friendly interface and a large catalog of market datasets.<br><br>A distinguishing feature of Alpha Vantage is the breadth of its data coverage. The platform delivers real-time and historical financial market data through programmatic APIs and spreadsheet integrations, enabling developers to build trading dashboards, quantitative research pipelines, and automated trading tools on top of a unified data interface.<br><br>The API also provides a rich library of built-in analytics&#8212;including technical indicators and fundamental datasets&#8212;allowing users to retrieve both raw market data and higher-level financial signals without implementing complex calculations themselves. In practice, this makes Alpha Vantage a flexible backbone for applications ranging from educational projects and fintech prototypes to production trading systems and investment research platforms.</p><h3><strong>What makes it valuable in the AI Agent era?</strong></h3><p>Alpha Vantage has become particularly relevant in the emerging ecosystem of LLM-powered financial tools and autonomous AI agents, largely because it provides structured market data in formats that are easy for agents and models to access, reason over, and integrate into automated workflows.<br><br><strong>1. Native integration with AI agent ecosystems via MCP</strong><br>Alpha Vantage provides an official Model Context Protocol (MCP) server, enabling large language models and agent-based applications to directly access financial data through standardized tools. The MCP server allows AI assistants and development environments to query real-time and historical stock market data programmatically, turning the API into a plug-and-play data source for agentic systems. <br><br><strong>2. Compatibility with multi-agent financial research systems</strong><br>Modern agentic trading frameworks increasingly rely on structured financial APIs like Alpha Vantage as data sources. For example, the open-source <a href="https://trading-agents.ai/">TradingAgents framework</a> simulates a professional trading firm using multiple LLM-powered agents&#8212;such as fundamental analysts, technical analysts, sentiment analysts, traders, and risk managers&#8212;that collaborate to analyze equities and make decisions. This system is powered by Alpha Vantage API as the core data backbone. <br><br><strong>3. Documentation and developer assets optimized for machine consumption</strong><br>Another advantage in the LLM era is the structure and accessibility of Alpha Vantage&#8217;s developer resources. The platform provides comprehensive API documentation, examples, and community libraries across many programming languages, making it straightforward for both humans and AI coding agents to integrate financial data pipelines. Because LLM-powered development tools rely heavily on structured documentation, well-defined API endpoints, and example code, this ecosystem of docs, SDKs, and README files makes Alpha Vantage particularly easy for AI systems to learn and use.</p><h3><strong>In short</strong></h3><p>Alpha Vantage&#8217;s combination of structured financial APIs, an MCP interface for AI agents, and extensive developer documentation positions it as a data infrastructure layer for the emerging generation of AI-powered trading tools, research agents, and autonomous financial analysis systems.</p><div><hr></div><h2>Tradier</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5L68!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5L68!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png 424w, https://substackcdn.com/image/fetch/$s_!5L68!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png 848w, https://substackcdn.com/image/fetch/$s_!5L68!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png 1272w, https://substackcdn.com/image/fetch/$s_!5L68!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5L68!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png" width="1456" height="885" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:885,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:848183,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/190589167?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5L68!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png 424w, https://substackcdn.com/image/fetch/$s_!5L68!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png 848w, https://substackcdn.com/image/fetch/$s_!5L68!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png 1272w, https://substackcdn.com/image/fetch/$s_!5L68!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f43f50f-00df-4007-8f4e-b28403ddf3a0_2399x1459.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Overview</h3><p><a href="https://tradier.com/">Tradier</a> is a brokerage-focused API platform that combines market data, account access, and trading functionality. Its public API supports real-time, delayed, and historical market data through both request/response endpoints and streaming interfaces, while also exposing brokerage capabilities such as account information, positions, orders, watchlists, and trade execution.</p><p>A key differentiator is that Tradier is not just a data API. It is part of a brokerage stack. That means developers can use it not only to retrieve quotes, options chains, time-and-sales data, and historical pricing, but also to connect agentic workflows directly to trading and portfolio actions. Tradier also supports HTTP and WebSocket streaming, which is useful when building systems that need fast updates rather than purely batch-style analysis.</p><p>Tradier&#8217;s market-data positioning is more U.S.-brokerage-centric than broad all-asset-class research platforms. Real-time data is available to Tradier Brokerage account holders for U.S. stocks and options, and delayed data follows the standard 15-minute model for non-real-time access. That makes Tradier particularly compelling for execution-oriented applications rather than for the widest possible global dataset footprint.</p><h3>What makes it valuable in the AI Agent era?</h3><p><strong>1. MCP and LLM-oriented documentation</strong><br>Tradier is unusually forward-leaning in how it presents its docs to the LLM era. Its documentation includes <code>llms.txt</code>, dedicated LLM resources, and a Tradier MCP section. Tradier&#8217;s own MCP documentation says users can access market data, account details, documentation, and even place trades from within connected AI tools. That makes Tradier one of the few providers publicly bridging financial APIs and conversational interfaces in a first-class way.</p><p><strong>2. Strong fit for execution-capable agents</strong><br>Many financial APIs stop at data retrieval. Tradier goes further by combining data access with brokerage actions such as order placement, account history, positions, and balances. In the AI agent era, that matters because the most interesting systems are often not just research agents but action-taking agents. Tradier is therefore especially relevant for developers building guarded execution workflows, trading copilots, or semi-autonomous assistants that need both read and act capabilities.</p><p><strong>3. Streaming interfaces for real-time agent loops</strong><br>Tradier supports both HTTP and WebSocket streaming for market and account data. That is important for agent architectures that continuously monitor events, react to intraday changes, or trigger downstream workflows when market conditions shift. In practical terms, Tradier is better suited than batch-only APIs for event-driven agents that need live context rather than periodic polling alone.</p><h3>In short</h3><p>Tradier is one of the strongest options for AI agents that need to move beyond analysis into brokerage-connected workflows. It may not be the broadest general-purpose research API, but for U.S.-market, execution-aware agents, Tradier&#8217;s mix of market data, account endpoints, streaming support, and MCP/LLM resources makes it highly relevant.</p><div><hr></div><h2>Xignite</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mBY8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mBY8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png 424w, https://substackcdn.com/image/fetch/$s_!mBY8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png 848w, https://substackcdn.com/image/fetch/$s_!mBY8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png 1272w, https://substackcdn.com/image/fetch/$s_!mBY8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mBY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png" width="1456" height="698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2318388,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/190589167?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mBY8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png 424w, https://substackcdn.com/image/fetch/$s_!mBY8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png 848w, https://substackcdn.com/image/fetch/$s_!mBY8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png 1272w, https://substackcdn.com/image/fetch/$s_!mBY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcbb256b-df66-44d4-909a-a9faf81debb0_2777x1331.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Overview</h3><p><a href="https://www.xignite.com/">Xignite</a> is an enterprise financial data platform centered on cloud-delivered APIs and market-data management. Its catalog covers stock quotes, ETFs and mutual funds, foreign exchange, futures and options, indices and benchmarks, fixed income and rates, company fundamentals, reference data, earnings, and news. The company also emphasizes broad upstream sourcing, stating that its data comes from more than 250 providers, alongside curated in-house datasets.</p><p>Xignite&#8217;s public positioning is less &#8220;developer hobbyist API&#8221; and more enterprise-grade market data infrastructure. It highlights unlimited-usage pricing, flexible commercial packaging by asset class, call frequency, and region, and delivery models that include real-time, historical, and reference data. Its developer materials also show a broad set of products for delayed quotes, real-time quotes, historical data, streaming, alerts, IPOs, and company information.</p><p>That means Xignite is best understood as a data platform for institutions and mature fintech products rather than as a lightweight API-first experimentation layer. For many teams, that is a feature, not a drawback. In an AI stack, the most valuable data provider is often the one that can reliably serve as the normalized source behind internal models, orchestration layers, and production analytics systems. This last point is an inference from Xignite&#8217;s product positioning toward scalable enterprise delivery and market-data management.</p><h3>What makes it valuable in the AI Agent era?</h3><p><strong>1. Enterprise-grade breadth for multi-source agent pipelines</strong><br>AI agents become more useful when they can combine quotes, fundamentals, benchmarks, reference data, and news into a single reasoning loop. Xignite&#8217;s catalog is strong on this dimension. Because it covers a wide range of asset classes and reference datasets, it can act as the structured data layer beneath enterprise financial copilots and internal analyst tools.</p><p><strong>2. Strong fit for organizations building their own orchestration layer</strong><br>Unlike Alpha Vantage or EODHD, Xignite&#8217;s public materials emphasize APIs, coverage, and market-data management rather than agent-specific packaging. In practice, that makes it attractive for organizations that want to build their own AI architecture on top of a robust enterprise data backbone instead of depending on vendor-supplied MCP experiences. That is an inference from Xignite&#8217;s public positioning around cloud APIs, data management, and unlimited-usage commercial structure.</p><p><strong>3. Flexible delivery for production-scale systems</strong><br>Xignite supports multiple delivery modes across real-time, delayed, historical, and streaming-style services, and it explicitly markets itself for demanding display applications, backtesting, alerts, and application integration. That flexibility matters in AI systems because not all components need the same data path: one model might need historical fundamentals, another might need event-driven market updates, and a third might need reference data normalization.</p><h3>In short</h3><p>Xignite is not the most visibly AI-marketed provider in this group, but it is a serious contender for enterprise AI finance stacks. If your goal is to build a proprietary agent platform on top of large-scale, normalized market-data services, Xignite&#8217;s breadth and infrastructure orientation make it more compelling than its relative lack of public AI branding might suggest.</p><div><hr></div><h2>EOD Historical Data</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W-VS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W-VS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png 424w, https://substackcdn.com/image/fetch/$s_!W-VS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png 848w, https://substackcdn.com/image/fetch/$s_!W-VS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png 1272w, https://substackcdn.com/image/fetch/$s_!W-VS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W-VS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png" width="1456" height="752" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:752,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1281074,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/190589167?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W-VS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png 424w, https://substackcdn.com/image/fetch/$s_!W-VS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png 848w, https://substackcdn.com/image/fetch/$s_!W-VS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png 1272w, https://substackcdn.com/image/fetch/$s_!W-VS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F488ecb5a-a062-4fc1-8602-441173a0ea4a_2381x1229.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Overview</h3><p><a href="https://eodhd.com/">EOD Historical Data</a>, now commonly presented as EODHD, offers a broad financial data platform spanning fundamentals, historical end-of-day prices, live and real-time feeds, intraday data, U.S. options, financial news, stock screeners, technical indicators, and exchange/reference datasets. On its homepage, the company positions itself as a &#8220;one-stop shop&#8221; for 30+ years of historical, fundamental, and real-time data across global markets, with coverage figures including 60 stock exchanges and 150,000 tickers.</p><p>One of EODHD&#8217;s strengths is that it sits between lightweight developer tools and more professional research infrastructure. It offers structured JSON and CSV responses, coding libraries, spreadsheet add-ons, and a broad menu of market datasets without being limited to only one narrow workflow. It also exposes precomputed technical indicators through API endpoints rather than requiring users to calculate everything from raw time series.</p><p>This combination makes EODHD particularly attractive for builders who want reasonably broad market-data coverage and analytics features in a format that remains accessible to smaller teams, solo developers, and applied AI prototypes.</p><h3>What makes it valuable in the AI Agent era?</h3><p><strong>1. Official MCP support for agent integration</strong><br>EODHD provides an official MCP server for financial data and explicitly documents how to connect it to ChatGPT, Claude, and custom AI agents. The company describes this as a way for AI agents and LLMs to access real-time and historical financial data directly through MCP, making EODHD one of the clearest AI-era data providers alongside Alpha Vantage and Tradier.</p><p><strong>2. An official ChatGPT-oriented financial assistant</strong><br>Beyond MCP, EODHD also offers an official Financial Assistant for ChatGPT, which it describes as an AI that can generate code for EODHD APIs and provide finance insights grounded in real data and news. That does not just signal marketing interest in AI; it suggests the company is actively shaping its product and developer experience around LLM-driven usage patterns.</p><p><strong>3. Strong structured outputs plus higher-level analytics</strong><br>EODHD&#8217;s AI relevance is also practical. It provides structured JSON/CSV outputs, extensive API documentation, libraries, and technical-indicator endpoints that already package financial signals into machine-usable form. For agentic systems, that reduces the burden of transforming raw market data before it can be used in screening, summarization, ranking, or recommendation workflows.</p><h3>In short</h3><p>EODHD is one of the strongest all-around options for the AI agent era. It combines broad market coverage with precomputed indicators, developer-friendly structured data, an official MCP server, and a ChatGPT-oriented assistant. For teams that want something more AI-forward than classic enterprise vendors but broader than a narrow single-purpose API, EODHD is a very strong choice.</p><div><hr></div><h2>QuoteMedia</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T3Qn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T3Qn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png 424w, https://substackcdn.com/image/fetch/$s_!T3Qn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png 848w, https://substackcdn.com/image/fetch/$s_!T3Qn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png 1272w, https://substackcdn.com/image/fetch/$s_!T3Qn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T3Qn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png" width="1456" height="785" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:785,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1613479,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/190589167?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T3Qn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png 424w, https://substackcdn.com/image/fetch/$s_!T3Qn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png 848w, https://substackcdn.com/image/fetch/$s_!T3Qn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png 1272w, https://substackcdn.com/image/fetch/$s_!T3Qn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a158f85-6722-4e0c-8fb4-5eb4a4b119c6_2613x1409.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Overview</h3><p><a href="https://quotemedia.com/">QuoteMedia</a> is a long-standing market-data provider focused on real-time and historical data, news, analytics, and financial information for brokerages, websites, trading systems, and investor-facing products. Its Request APIs and OnDemand services are built around cloud-based access to market data, while its streaming products emphasize tick-by-tick delivery, low latency, and enterprise-grade reliability. QuoteMedia also highlights broad operational scale, including 110+ global exchanges, 200+ data APIs, 99.99% uptime, and 100+ news providers.</p><p>A notable strength of QuoteMedia is delivery flexibility. Its platform spans REST-style OnDemand APIs, WebSocket and other streaming interfaces, and SFTP-based file services for bulk delivery. It also supports JSON, XML, CSV, option-chain data, company profiles, historical time series, filings, and custom calculations. That makes QuoteMedia less of a single API product and more of a market-data delivery platform.</p><p>QuoteMedia&#8217;s public positioning is similar to Xignite in one important way: it is more infrastructure-oriented than explicitly LLM-oriented. In other words, its clearest strengths are reliability, breadth, delivery options, and integration into financial products, not public MCP or agent-marketing. That is an inference from the official materials reviewed.</p><h3>What makes it valuable in the AI Agent era?</h3><p><strong>1. Low-latency data for real-time agent monitoring</strong><br>QuoteMedia&#8217;s streaming stack is designed for real-time or delayed tick-by-tick data, normalized for ease of use and optimized for single-digit millisecond performance. For AI systems that monitor live markets, score signals, or trigger alerts and workflows off intraday movement, that kind of delivery profile is highly relevant.</p><p><strong>2. Multiple delivery modes for different agent architectures</strong><br>Modern AI finance stacks are not monolithic. Some components work best with REST requests, others with streams, and others with bulk files for offline training or evaluation. QuoteMedia supports cloud REST APIs, streaming APIs, and SFTP/file services, which makes it well suited to organizations building layered pipelines that combine real-time agent behavior with batch analytics and historical model development.</p><p><strong>3. Strong fit as a production data layer</strong><br>QuoteMedia offers market data, news, analytics, company profiles, option chains, filings, and historical data in structured formats such as JSON, XML, and CSV. That breadth makes it a useful foundation for internal copilots, research dashboards, summarization systems, and client-facing financial applications where the &#8220;AI&#8221; layer is built on top of the data platform rather than bundled by the vendor itself.</p><h3>In short</h3><p>QuoteMedia is a strong candidate for teams that care more about production-grade delivery and integration flexibility than about whether the vendor has already branded itself around AI agents. In the AI agent era, that still matters a lot: a reliable, low-latency, multi-format market-data backbone can be more valuable than flashy AI positioning if you are building your own orchestration layer.</p><div><hr></div><h2>Conclusion</h2><p>If the goal is to find the most AI-ready providers, Alpha Vantage, Tradier, and EODHD stand out because they already offer MCP or LLM-oriented support. Alpha Vantage is particularly strong for AI-native research tools, Tradier is strong for brokerage-connected agents, and EODHD is a strong general-purpose choice.</p><p>If the goal is enterprise-grade infrastructure for proprietary AI systems, Xignite and QuoteMedia remain highly relevant. They may be less visibly AI-marketed, but they are strong as scalable market data backbones.</p><p>So in the AI agent era, the best stock market data API depends on what you are building. For AI-native financial research, Alpha Vantage has a strong edge. For execution-oriented agents, Tradier stands out. For broad AI-enabled workflows, EODHD is highly competitive. For enterprise infrastructure, Xignite and QuoteMedia are still important players.</p>]]></content:encoded></item><item><title><![CDATA[7 SQL Use Cases Every Data Professional Should Know]]></title><description><![CDATA[Most people learn SQL as syntax. The real thing is knowing what kinds of problems it helps you solve.]]></description><link>https://www.nb-data.com/p/7-sql-use-cases-every-data-professional</link><guid isPermaLink="false">https://www.nb-data.com/p/7-sql-use-cases-every-data-professional</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sat, 07 Mar 2026 12:19:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9orW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9orW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9orW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!9orW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!9orW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!9orW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9orW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2007256,&quot;alt&quot;:&quot;7 SQL Use Cases Every Data Professional Should Know&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/190182290?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="7 SQL Use Cases Every Data Professional Should Know" title="7 SQL Use Cases Every Data Professional Should Know" srcset="https://substackcdn.com/image/fetch/$s_!9orW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!9orW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!9orW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!9orW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9bbb6184-6b0e-45de-9ab4-0f7ec96007de_1376x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A lot of people learn SQL in a frustrating way.</p><p>They start with <code>SELECT</code>, <code>FROM</code>, <code>WHERE</code>, <code>GROUP BY</code>, maybe a few joins, and if they stay long enough, a window function or two. They can write queries. They can pass the exercises. But when they face a real business question, they still freeze.</p><p>That usually happens because they learned SQL as a list of clauses instead of a way to think.</p><p>In real work, SQL is rarely about showing that you remember syntax. It is about knowing what question arises once it hits the data. Questions such as:</p><ul><li><p>Is this a reporting problem? </p></li><li><p>A funnel problem? </p></li><li><p>A cohort problem? </p></li><li><p>A segmentation problem? </p></li><li><p>A QA problem? </p></li></ul><p>The moment you can recognize that, SQL becomes much less intimidating and much more useful. That is the shift that matters.</p><p>The people who get genuinely strong at SQL are usually not the people who memorize the most functions. They are the people who can look at a business question and quickly understand what kind of data transformation it needs.</p><p>So instead of thinking about SQL as &#8220;a language I should know,&#8221; I think it is more useful to think about it as a toolkit for a handful of recurring jobs.</p><p>Here are seven of the most important ones. Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>1. KPI reporting</h2><p>When teams want to know what is happening in the business, they usually start with some version of a KPI question. Revenue by month. Daily active users. Orders by country. Average order value. Churn rate by plan. Refund rate by product. These are not flashy questions, but they are the foundation of most reporting work.</p><p>This is where SQL starts becoming practical. You are not trying to prove how advanced you are. You are trying to turn raw data into something clear enough for another person to act on.</p><p>That means defining the metric carefully, filtering the right time window, grouping at the right level, and returning a result that is readable. The technical tools are simple, but the judgment behind them matters a lot.</p><p>A lot of people underestimate this kind of SQL because it feels too basic. I think that is a mistake. A team with weak KPI logic usually ends up with weak everything else.</p><p>A simple example is monthly revenue by product category:</p><pre><code>SELECT
    DATE_TRUNC(&#8217;month&#8217;, order_date) AS order_month,
    product_category,
    SUM(revenue) AS total_revenue
FROM orders
WHERE order_date &gt;= DATE &#8216;2026-01-01&#8217;
GROUP BY 1, 2
ORDER BY 1, 3 DESC;</code></pre><p>This is a basic grouped summary, but that is exactly why it matters. A lot of useful SQL is just good filtering, clean aggregation, and returning a table that another person can use.</p><h2>2. Funnel analysis</h2><p>The second major use case is figuring out where people drop off.</p><p>This is where SQL starts feeling very close to product and growth work. A funnel question usually sounds like this: how many users started onboarding, how many completed profile setup, how many created their first project, and how many upgraded? In ecommerce, the same question shows up as view product, add to cart, begin checkout, and pay.</p><p>What makes funnel analysis valuable is that it shows where interest turns into friction.</p><p>A lot of the time, the problem is not &#8220;traffic is low.&#8221; The problem is that the path breaks at one specific step. SQL helps you see that step clearly. It lets you move from a vague sense that &#8220;conversion feels weak&#8221; to a more precise question like &#8220;why do so many users disappear between signup and first action?&#8221;</p><p>A simple event-based funnel might look like this:</p><pre><code>SELECT
    step_name,
    COUNT(DISTINCT user_id) AS users_at_step
FROM onboarding_events
WHERE event_date &gt;= DATE &#8216;2026-03-01&#8217;
GROUP BY 1
ORDER BY
    CASE step_name
        WHEN &#8216;signup&#8217; THEN 1
        WHEN &#8216;verify_email&#8217; THEN 2
        WHEN &#8216;create_project&#8217; THEN 3
        WHEN &#8216;first_active_use&#8217; THEN 4
    END;</code></pre><p>This is not the most advanced funnel query in the world, but it already gives you a clearer conversation. Instead of saying &#8220;activation is weak,&#8221; you can ask, &#8220;Why do so many users disappear between verification and first project creation?&#8221;</p><p>Once you can answer that, the conversation gets much more useful.</p><h2>3. Cohort retention analysis</h2><p>This is one of the most important SQL use cases because it forces better thinking.</p><p>A cohort retention analysis groups users by a shared starting point, then checks whether they come back in later periods. That sounds simple, but it is one of those areas where small definition choices change the whole story. What puts a user into a cohort? What counts as a return? What does a week mean? Should a user count once per week or every time they generate an event?</p><p>That is why good retention work is not mainly about writing SQL. It is about locking the logic before the SQL ever begins.</p><p>This is also where SQL becomes more than a reporting language. It becomes a way of expressing lifecycle behavior. Once you can build a trustworthy retention table, you can stop asking &#8220;are users coming back?&#8221; in a vague way and start asking &#8220;which users are sticking, when do they drop, and what changed across cohorts?&#8221;</p><p>That is one of the reasons I like this use case so much. It pushes people past syntax into actual analytical design.</p><p>A very small example of the logic looks like this:</p><pre><code>WITH user_cohort AS (
    SELECT
        user_id,
        DATE_TRUNC(&#8217;week&#8217;, MIN(login_date)) AS cohort_week
    FROM logins
    GROUP BY 1
),
user_activity AS (
    SELECT
        l.user_id,
        DATE_TRUNC(&#8217;week&#8217;, l.login_date) AS activity_week
    FROM logins l
    GROUP BY 1, 2
)
SELECT
    c.cohort_week,
    a.activity_week,
    COUNT(DISTINCT a.user_id) AS active_users
FROM user_cohort c
JOIN user_activity a
  ON c.user_id = a.user_id
GROUP BY 1, 2
ORDER BY 1, 2;</code></pre><p>This is only the skeleton, not the full retention table. But even here, you can already see the shape: assign the cohort, map later activity, then aggregate by period.</p><p>You can check the deep dive of this use case here:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a0951efb-026e-4d0d-aeaf-8731e89e2fbc&quot;,&quot;caption&quot;:&quot;Most retention tables are not wrong because the SQL is complicated.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Cohort Retention in SQL&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:6000855,&quot;name&quot;:&quot;Cornellius Yudha Wijaya&quot;,&quot;bio&quot;:&quot;Sharing Data Knowledge to improve your values&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/583076b2-657b-44bf-8aa9-9263e5bf04f0_544x544.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2026-03-06T18:39:19.710Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!5opy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.nb-data.com/p/cohort-retention-in-sql&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:190126571,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:1,&quot;comment_count&quot;:0,&quot;publication_id&quot;:37262,&quot;publication_name&quot;:&quot;Non-Brand Data&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!06DP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6c0e1cde-d120-4029-8ffd-2a8c7c6e4504_1280x1280.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><h2>4. Segmentation</h2><p>Once you know the overall number, the next question is almost always: who exactly is driving it?</p><p>That is segmentation.</p><p>Averages are useful, but they hide a lot. SQL becomes much more powerful once you stop treating all users as one group and start cutting the data into meaningful slices. That might mean country, plan, acquisition channel, device type, power users versus casual users, or first purchase month.</p><p>And in practice, this is where a lot of strong SQL users separate themselves. They stop producing one big average and start showing where the business behaves differently across groups.</p><p>A simple segmentation example might be conversion rate by acquisition channel:</p><pre><code>SELECT
    acquisition_channel,
    COUNT(DISTINCT user_id) AS users,
    SUM(CASE WHEN converted = 1 THEN 1 ELSE 0 END) AS converted_users,
    ROUND(
        1.0 * SUM(CASE WHEN converted = 1 THEN 1 ELSE 0 END)
        / COUNT(DISTINCT user_id),
        3
    ) AS conversion_rate
FROM user_conversion_summary
GROUP BY 1
ORDER BY conversion_rate DESC;</code></pre><p>This is where SQL starts feeling strategic. You stop asking, &#8220;Is conversion improving?&#8221; and start asking, &#8220;Is conversion improving for the users we actually care about?&#8221;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/7-sql-use-cases-every-data-professional?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/7-sql-use-cases-every-data-professional?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><h2>5. Experiment analysis</h2><p>If you work near product or growth teams, SQL becomes very important the moment experiments show up.</p><p>Before anyone talks about significance, lift, or confidence intervals, someone still has to build the dataset properly. Who was in the control group? Who was in the treatment group? Who converted? Over what window? Were there logging issues? Did the assignment logic work as expected?</p><p>A lot of that early work is SQL.</p><p>And this matters more than people think, because if the experiment table is wrong, everything that comes after it is already compromised. If the assignment table is joined incorrectly, if the outcome window is inconsistent, or if duplicated rows quietly inflate conversions, the eventual statistical discussion becomes much less meaningful.</p><p>So even though experiment analysis sounds advanced, a lot of it still comes down to careful SQL habits and clean dataset construction.</p><p>A simple experiment summary might look like this:</p><pre><code>SELECT
    variant,
    COUNT(DISTINCT user_id) AS users,
    SUM(CASE WHEN purchased = 1 THEN 1 ELSE 0 END) AS purchasers,
    ROUND(
        1.0 * SUM(CASE WHEN purchased = 1 THEN 1 ELSE 0 END)
        / COUNT(DISTINCT user_id),
        3
    ) AS purchase_rate
FROM experiment_user_summary
WHERE experiment_name = &#8216;checkout_redesign_v1&#8217;
GROUP BY 1
ORDER BY 1;</code></pre><p>That is not the full experiment analysis, but it is the foundation.</p><h2>6. Data quality and QA checks</h2><p>This is one of the least glamorous SQL use cases, and one of the most valuable.</p><p>A huge amount of trust in data work comes from catching bad structure early. Duplicate rows. Missing keys. Broken joins. Sudden changes in counts. Tables that stopped updating. Records that should be impossible but somehow exist anyway.</p><p>SQL is excellent for this kind of work because it is good at isolating patterns, comparing counts, checking coverage, and surfacing anomalies before they become reporting problems.</p><p>This is also one of the places where data professionals become more mature in practice. They stop using SQL only to answer the question they were asked, and they start using SQL to challenge whether the dataset itself deserves trust.</p><p>That is a very different mindset.</p><p>Once you develop it, your work usually becomes much more reliable.</p><p>For example, if you want to check for duplicate order IDs:</p><pre><code>SELECT
    order_id,
    COUNT(*) AS row_count
FROM orders
GROUP BY 1
HAVING COUNT(*) &gt; 1
ORDER BY row_count DESC;</code></pre><p>This is basic, but incredibly useful. </p><h2>7. Operational monitoring</h2><p>The last use case is the one that makes SQL feel closest to the day-to-day operating layer of a business.</p><p>Sometimes the question is not &#8220;what happened this quarter?&#8221; Sometimes the question is &#8220;did the pipeline run?&#8221;, &#8220;are transactions missing?&#8221;, &#8220;did yesterday&#8217;s volume collapse?&#8221;, or &#8220;did a critical table stop refreshing?&#8221;</p><p>At that point, SQL is not just helping with analysis. It is helping keep the system honest.</p><p>This kind of work often lives somewhere between analytics, operations, and data engineering. You are comparing expected versus actual counts, checking daily or weekly movement, and trying to spot problems before somebody else finds them in a broken dashboard or an angry meeting.</p><p>If you only think of SQL as a tool for reports, you miss how often it becomes part of the business&#8217;s operational nervous system.</p><p>A simple monitoring query might compare day-over-day order counts:</p><pre><code>SELECT
    order_date,
    COUNT(*) AS orders_today,
    LAG(COUNT(*)) OVER (ORDER BY order_date) AS orders_yesterday
FROM orders
GROUP BY 1
ORDER BY 1;</code></pre><p>This is where window functions become especially useful. They let you compare each row to related rows while keeping the row-level result visible, which is exactly the kind of thing you want for trend and monitoring work. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/7-sql-use-cases-every-data-professional/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/7-sql-use-cases-every-data-professional/comments"><span>Leave a comment</span></a></p><h2>The bigger point</h2><p>If you look across all seven use cases, the pattern is pretty clear.</p><p>SQL is rarely valuable because of its isolated syntax.</p><p>It is valuable because the same small set of ideas keeps getting reused across real work.</p><p>That is why strong SQL users usually do not sound like they are reciting functions. They sound like they understand data shape.</p><p>That is a much better goal than &#8220;learn more SQL syntax.&#8221;</p><h2>Where to go next</h2><p>If you are still early, I would not try to learn every advanced clause in one sitting.</p><p>I would focus on connecting SQL to actual problems.</p><p>That is exactly why I built the SQL track into the NBD Focus Map. The point is not to learn SQL randomly. The point is to see how the pieces fit together and start shipping small, useful work with them.</p><h2>Start here</h2><p>If you want the broader path, start with the <strong>Focus Map</strong>:<br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/nbd-focus-map-free-pdf&quot;,&quot;text&quot;:&quot;Focus Map&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/nbd-focus-map-free-pdf"><span>Focus Map</span></a></p><p><br>If you want the full paid system, use:</p><ul><li><p><strong>Vault:</strong> <a href="https://www.nb-data.com/p/nbd-reading-vault-paid-guided-paths">https://www.nb-data.com/p/nbd-reading-vault-paid-guided-paths</a></p></li><li><p><strong>Template Index:</strong> <a href="https://www.nb-data.com/p/template-pack-index-paid">https://www.nb-data.com/p/template-pack-index-paid</a></p></li><li><p><strong>Subscriber Benefits:</strong> <a href="https://www.nb-data.com/p/subscriber-benefits?utm_source=chatgpt.com">https://www.nb-data.com/p/subscriber-benefits</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Cohort Retention in SQL]]></title><description><![CDATA[From Raw Events to a Decision-Ready Table]]></description><link>https://www.nb-data.com/p/cohort-retention-in-sql</link><guid isPermaLink="false">https://www.nb-data.com/p/cohort-retention-in-sql</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Fri, 06 Mar 2026 18:39:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5opy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5opy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5opy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5opy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5opy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5opy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5opy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg" width="1312" height="736" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:736,&quot;width&quot;:1312,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89400,&quot;alt&quot;:&quot;Cohort Retention in SQL&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/190126571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Cohort Retention in SQL" title="Cohort Retention in SQL" srcset="https://substackcdn.com/image/fetch/$s_!5opy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5opy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5opy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5opy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e76dc69-8ad0-449f-8f74-a8b8885dbdd9_1312x736.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most retention tables are not wrong because the SQL is complicated.</p><p>They are wrong because the definitions are loose.</p><p>Someone says, &#8220;Let&#8217;s look at retention,&#8221; a query gets written, a heatmap shows up in a dashboard, and suddenly everyone is talking about Week 1 and Month 1 as if those numbers are objective facts. They usually are not. They are the result of choices. What counts as the start of a user&#8217;s journey? What counts as a return? What exactly is a week? What timezone are we using? Are we measuring one user once per period, or accidentally counting heavy users multiple times?</p><p>That is the real work in cohort retention. Not the division. Not the pivot table. The real work is deciding what story the table is allowed to tell.</p><p>At its core, cohort analysis is simple. You group users by a shared starting point, then measure what those users do in later periods. That is the common backbone behind most cohort SQL tutorials and warehouse implementations.</p><p>What makes it tricky is that small choices can change the story enough to change the decision.</p><p>So in this piece, I want to show you how I think about cohort retention in SQL when I want something that is not just presentable, but actually trustworthy. We will walk through a small sample dataset, turn it into a retention table step by step, and discuss the parts that often go wrong: cohort definition, return-event design, week boundaries, duplicate activity, partial cohorts, and interpretation.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Start with the question, not the query</h2><p>Before touching SQL, I like to ask one uncomfortable question:</p><p><strong>What exactly do I want this retention table to help me decide?</strong></p><p>That question matters because different cohort definitions answer different business questions.</p><p>If I group users by the week they signed up, I am usually asking something about onboarding, activation, or acquisition quality. I want to know whether new users are sticking around after entering the funnel.</p><p>If I group users by the week they first did something meaningful, I am asking something slightly different. I am saying that signup is not the real beginning of value. Maybe the real beginning is the first login, the first purchase, the first report built, or the first document uploaded. In that case, I am less interested in the funnel entry and more interested in what happens once a user actually starts using the product.</p><p>Both are valid. But they are not interchangeable.</p><p>The same holds for the return event. If I define retention as &#8220;any page view,&#8221; my table might look reassuring while hiding the fact that users are not doing anything meaningful. If I define retention as &#8220;purchase,&#8221; the metric might be more valuable but also much sparser. There is no universally correct event. There is only one event that is more or less aligned with the value loop you care about.</p><p>Then there is the time bucket. This is the part people often treat as neutral, even though it really isn&#8217;t. A daily retention table tells a different story than a weekly one. A weekly table tells a different story than a monthly one. And even the idea of a &#8220;week&#8221; is less fixed than people think. BigQuery, for example, distinguishes between <code>WEEK</code>, <code>WEEK(&lt;WEEKDAY&gt;)</code>, and <code>ISOWEEK</code>, and those choices affect how dates are grouped and how period differences are calculated.</p><p>That is why I think of cohort retention as a design problem before I think of it as a SQL problem.</p><h2>The version we&#8217;re building here</h2><p>To make this concrete, let&#8217;s keep the example small and explicit.</p><p>In this walkthrough:</p><ul><li><p>A user&#8217;s cohort is the <strong>week of their first login</strong></p></li><li><p>Retention means they performed a <strong>login</strong> in a later week</p></li><li><p>The table uses <strong>calendar weeks</strong></p></li><li><p>Each user should count at most <strong>once per week</strong></p></li></ul><p>That last condition matters a lot. If a user logs in ten times in the same week, they are still one retained user for that week. Retention is about whether someone came back in the period, not how noisy their event stream was.</p><h2>Sample data</h2><p>Here is a tiny events table we can use end-to-end.</p>
      <p>
          <a href="https://www.nb-data.com/p/cohort-retention-in-sql">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Template Pack Index (Paid)]]></title><description><![CDATA[Last updated: 24 May 2026]]></description><link>https://www.nb-data.com/p/template-pack-index-paid</link><guid isPermaLink="false">https://www.nb-data.com/p/template-pack-index-paid</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sun, 01 Mar 2026 15:43:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!06DP!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6c0e1cde-d120-4029-8ffd-2a8c7c6e4504_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is the paid asset library for Non-Brand Data.</p><p>Each template pack is designed to help you ship faster, not just read more. Pick one pack based on your current goal, use it for 7&#8211;14 days, and finish with a visible artifact: a notebook, SQL analysis, write-up, repo, or project page.</p>
      <p>
          <a href="https://www.nb-data.com/p/template-pack-index-paid">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[NBD Reading Vault (Paid): Guided Paths + Mini-Projects]]></title><description><![CDATA[Last updated: 24 Feb 2026]]></description><link>https://www.nb-data.com/p/nbd-reading-vault-paid-guided-paths</link><guid isPermaLink="false">https://www.nb-data.com/p/nbd-reading-vault-paid-guided-paths</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Tue, 24 Feb 2026 12:45:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!06DP!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6c0e1cde-d120-4029-8ffd-2a8c7c6e4504_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Start here if you feel overwhelmed</h3><p>If you are a paid member and do not know what to read next, do not browse the archive.</p><p>Choose one track below, follow the reading order, and ship one mini-project in 2&#8211;4 weeks.</p><p>This page is designed to help you move from reading to output.</p><p><strong>Do this before browsing the archive:</strong></p><ol><li><p>Pick one track only.</p></li><li><p>Read the first post in that &#8230;</p></li></ol>
      <p>
          <a href="https://www.nb-data.com/p/nbd-reading-vault-paid-guided-paths">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[✨Subscriber Benefits]]></title><description><![CDATA[Everything included in Non-Brand Data. Updated: February 2026]]></description><link>https://www.nb-data.com/p/subscriber-benefits</link><guid isPermaLink="false">https://www.nb-data.com/p/subscriber-benefits</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sun, 22 Feb 2026 13:26:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_8Co!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Non-Brand Data Subscriber Benefits</h1><p>This is the up-to-date summary of what you get as a free reader, paid member, or founding member.</p><p>Non-Brand Data is built around one simple idea:</p><p>Learn with structure. Practice with intention. Ship something useful.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_8Co!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_8Co!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png 424w, https://substackcdn.com/image/fetch/$s_!_8Co!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png 848w, https://substackcdn.com/image/fetch/$s_!_8Co!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png 1272w, https://substackcdn.com/image/fetch/$s_!_8Co!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_8Co!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png" width="1456" height="563" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d1b33d87-8050-4869-9be7-989f15554517_1533x593.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67585,&quot;alt&quot;:&quot;Non-Brand Data Subscriber Benefits&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/188513081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Non-Brand Data Subscriber Benefits" title="Non-Brand Data Subscriber Benefits" srcset="https://substackcdn.com/image/fetch/$s_!_8Co!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png 424w, https://substackcdn.com/image/fetch/$s_!_8Co!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png 848w, https://substackcdn.com/image/fetch/$s_!_8Co!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png 1272w, https://substackcdn.com/image/fetch/$s_!_8Co!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1b33d87-8050-4869-9be7-989f15554517_1533x593.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Full version</h2><h3>Free subscribers</h3><p>Start here: <strong><a href="https://www.nb-data.com/p/nbd-focus-map-free-pdf?r=3kmaf">NBD Focus Map (Free PDF)</a></strong></p><p><strong>What you get:</strong></p><ul><li><p>NBD Focus Map</p></li><li><p>Public posts and public archive</p></li><li><p>The best free posts in order</p></li><li><p>Subscriber chat and comment threads</p></li><li><p>Free practical resources, including the AI Evaluation Checklist for Data Professionals</p></li><li><p>Occasional free guides, templates, and learning notes</p></li></ul><p><strong>Best for:</strong></p><p>Readers who want a structured starting point before deciding whether to go deeper.</p><p>Use this tier if you want to pick one track, follow a simple cadence, and start shipping small artifacts from what you learn.</p><h3>Paid members</h3><p>Paid members get the structured side of Non-Brand Data.</p><p>What you get:</p><ul><li><p>Member-only deep-dive posts</p></li><li><p>Full archive access</p></li><li><p>NBD Reading Vault</p></li><li><p>Template Pack Index</p></li><li><p>Monthly or periodic template updates</p></li><li><p>Guided learning paths across SQL, Python + ML, and GenAI / RAG</p></li><li><p>Practical assets such as checklists, worksheets, rubrics, and project templates</p></li></ul><p><strong>Best for:</strong></p><p>Readers who do not want to browse randomly and want a clearer path from learning to practice.</p><p>Use this tier if you want to follow guided paths, reuse templates, and turn the archive into a practical learning system.</p><h2>What paid members should use first</h2><p>If you feel lost in the archive, start with the <strong><a href="https://www.nb-data.com/p/nbd-reading-vault-paid-guided-paths?r=3kmaf">NBD Reading Vault</a></strong>.</p><p>The Vault helps you:</p><ul><li><p>pick one track,</p></li><li><p>follow the reading order,</p></li><li><p>commit to a 2&#8211;4 week sprint,</p></li><li><p>and ship one mini-project.</p></li></ul><p>If you already know what you want to build, go to the <strong><a href="https://www.nb-data.com/p/template-pack-index-paid?r=3kmaf">Template Pack Index</a></strong>.</p><p>The Template Pack Index gives you reusable assets such as:</p><ul><li><p>Sprint Pack</p></li><li><p>SQL Work Pack</p></li><li><p>ML Experiment Pack</p></li><li><p>AI Evaluation Checklist</p></li></ul><div><hr></div><div><hr></div><h3>Founding members</h3><p>Founding members get everything in Paid, plus priority feedback and one annual review call.</p><p>The annual review call is for one project, portfolio entry, write-up, or learning artifact.</p><p>We focus on what to change so the work becomes clearer, stronger, and closer to hiring-manager-ready or stakeholder-ready quality.</p><p>What you get:</p><ul><li><p>Everything in Paid</p></li><li><p>Priority feedback</p></li><li><p>One annual 30-minute review call</p></li><li><p>Project, portfolio, or artifact review</p></li></ul><div><hr></div><div><hr></div><h2>One-time purchases (optional)</h2><p>These are separate from subscriptions. Buy once and reuse anytime.</p><ul><li><p><strong><a href="https://cornelliusyudhawijay.gumroad.com/l/otdloq">Portfolio Rubric Toolkit</a></strong></p></li><li><p><strong><a href="https://cornelliusyudhawijay.gumroad.com/">Data Science Resume Template</a> (FREE)</strong></p></li><li><p><strong><a href="https://cornelliusyudhawijay.gumroad.com/l/hdhuw">Python Packages to Learn Data Science (e-book) </a>(FREE)</strong></p></li></ul><div><hr></div><h2>How to redeem your benefits</h2><h3>All subscribers</h3><ul><li><p><strong><a href="https://www.nb-data.com/p/nbd-focus-map-free-pdf?r=3kmaf">Focus Map</a></strong></p></li><li><p><strong><a href="https://www.nb-data.com/p/welcome-to-non-brand-data-by-cornellius">Start Here</a></strong></p></li></ul><h3>Paid members</h3><ul><li><p>Access member-only posts by logging in with the email you used to subscribe</p></li><li><p>Template packs are delivered by email and will also be collected in one place as the vault grows</p></li><li><p>The member vault reading list will be added here once it is published</p></li></ul><h3>Founding members</h3><p>Reply to any email with the subject: <strong>Founding review</strong><br>Include:</p><ul><li><p>a link to your project/repo/write-up</p></li><li><p>What do you want feedback on. I will send the booking link, and we will schedule the 30 minutes.</p></li></ul><div><hr></div>]]></content:encoded></item><item><title><![CDATA[Creating a Daily Bulk Ingestion Pipeline for Historical Price Data and Fundamentals]]></title><description><![CDATA[Automate your financial information in your database]]></description><link>https://www.nb-data.com/p/creating-a-daily-bulk-ingestion-pipeline</link><guid isPermaLink="false">https://www.nb-data.com/p/creating-a-daily-bulk-ingestion-pipeline</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Thu, 19 Feb 2026 10:25:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kUfK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kUfK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kUfK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kUfK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kUfK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kUfK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kUfK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg" width="1120" height="706" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:706,&quot;width&quot;:1120,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!kUfK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kUfK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kUfK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kUfK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaad14e7-a6b1-4ea7-8eda-de1a540b1f9e_1120x706.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@iridial_?utm_source=medium&amp;utm_medium=referral">iridial</a> on <a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></div><p>In the finance field, we are usually trying to answer two related questions at the same time:</p><ul><li><p>What did the market do?</p></li><li><p>What did the business do?</p></li></ul><p>Prices move every trading day, reflecting new information and expectations. However, fundamentals update more slowly and in batches because public companies report on a cycle (e.g., U.S. issuers file Form 10-Q after the first three fiscal quarters and an annual 10-K). This becomes a pain point when we are doing valuation and screening reviews, as we need to pull the data at a specific time, but that time can become inconsistent.</p><p>This is why a daily ingestion pipeline exists. It gives us a consistent record that we reuse without re-downloading or questioning what we just pulled. Instead of relying on a live fetch each time, we can maintain a small local dataset that updates on schedule and is ready for further processing.</p><p>In this article, we will learn how to develop a daily bulk ingestion pipeline for historical price and fundamental data using source data from <a href="https://site.financialmodelingprep.com/?utm_source=medium&amp;utm_medium=medium&amp;utm_campaign=corn11">Financial Modelling Prep (FMP)</a>.</p><p>Curious about it? Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2><strong>Foundation</strong></h2><p>Before we move into the implementation details, it helps to treat this project as an ingestion layer built on top of an external data provider. Building this layer on top of the <a href="https://site.financialmodelingprep.com/?utm_source=medium&amp;utm_medium=medium&amp;utm_campaign=corn11">Financial Modeling Prep (FMP)</a> API offers several practical benefits for financial analysis work.</p><p>First, it reduces duplication by reusing steps for requesting data, validating responses, standardising column names, and applying rules (e.g., date handling) for each symbol.</p><p>Second, it creates a single control point for the workflow, centralizing API key handling and daily logic rather than duplicating logic across scripts.</p><p>Third, it provides a stable historical record by maintaining a local dataset rather than recomputing results from live calls, thereby simplifying research and reporting.</p><p>Finally, it supports routine operation with two phases: an initial backfill to build historical coverage and a daily run to keep data current. Once scheduled, the dataset is automatically updated, ensuring a reliable workflow.</p><h2><strong>The Data Source</strong></h2><p>Let&#8217;s start building our daily ingestion pipeline by deciding which datasets we will pull from FMP. In this project, all data comes from FMP&#8217;s Stable API, which uses a single base URL and a consistent URL pattern:</p><pre><code>https://financialmodelingprep.com/stable/</code></pre><p>In practice, FMP provides many endpoints, but this pipeline intentionally uses only a small subset. The goal is to identify the minimum datasets required to build a reliable store of historical prices and core financial statements, without introducing optional datasets that complicate maintenance.</p><p>For this pipeline, we rely on these endpoints:</p><ul><li><p><strong>Company search</strong> (<code>search-symbol</code>): Lets you search by company name or partial ticker and returns candidates with symbols, names, exchanges, and currencies.</p></li><li><p><strong>Company profile</strong> (<code>profile</code>): Returns the baseline company metadata you typically want to store alongside your price and fundamentals tables.</p></li><li><p><strong>Income statement</strong> (<code>income-statement</code>): Provides revenue, net income, and other income statement fields over time.</p></li><li><p><strong>Balance sheet statement </strong>(<code>balance-sheet-statement</code>): Provides assets, liabilities, and equity fields that help you understand the company&#8217;s financial position.</p></li><li><p><strong>Cash flow statement</strong> (<code>cash-flow-statement</code>): Provides operating, investing, and financing cash flow fields, which are essential for evaluating cash generation and sustainability.</p></li><li><p><strong>Historical end-of-day prices</strong> (<code>historical-price-eod/full</code>): Provides daily OHLCV and related fields for historical price storage.</p></li></ul><p>These datasets are sufficient to build a clean ingestion pipeline that stores daily prices by date and financial statements by reporting period, while keeping the system simple and easy to run every day.</p><h2><strong>Project structure</strong></h2><p>This project is intentionally organised to separate the application, data storage, and entry points.</p><p>A simplified view of the project looks like this:</p><pre><code>fmp<em>_daily_</em>ingestion/
&#9500;&#9472; .github/
&#9474;  &#9492;&#9472; workflows/
&#9474;     &#9492;&#9472; daily<em>_ingestion.yml
&#9500;&#9472; app/
&#9474;  &#9500;&#9472; <strong>__init__</strong>.py
&#9474;  &#9500;&#9472; db.py
&#9474;  &#9500;&#9472; fmp_</em>client.py
&#9474;  &#9500;&#9472; pipeline.py
&#9474;  &#9492;&#9472; settings.py
&#9500;&#9472; data/
&#9474;  &#9500;&#9472; fmp.sqlite3
&#9474;  &#9492;&#9472; scheduler.log
&#9500;&#9472; scripts/
&#9474;  &#9500;&#9472; <strong>__init__</strong>.py
&#9474;  &#9500;&#9472; backfill<em>_symbols.py
&#9474;  &#9500;&#9472; backfill_</em>prices.py
&#9474;  &#9500;&#9472; run<em>_daily.py
&#9474;  &#9500;&#9472; scheduler.py
&#9474;  &#9492;&#9472; check_</em>db.py
&#9500;&#9472; .env
&#9492;&#9472; requirements.txt</code></pre><p>Once we establish the project foundations, we will build our daily ingestion pipeline.</p><h2><strong>Step-by-Step Walkthrough</strong></h2><p>In this section, we will go through how our daily ingestion pipeline is built in each step.</p><h3><strong>Step 1: define dependencies and configuration</strong></h3><p>First, we set up the <code>requirements.txt</code>file by keeping the dependencies minimal.</p><pre><code>requests
python-dotenv
pandas
schedule</code></pre><p>We also define our <code>.env</code> file which will supply runtime configuration without hardcoding secrets or machine-specific paths into code.</p><pre><code>FMP_API_KEY=YOUR_KEY
FMP_STABLE_BASE_URL=https://financialmodelingprep.com/stable
DB_PATH=data/fmp.sqlite3

FMP_WATCHLIST=AAPL,MSFT,TSLA
FUNDAMENTALS_PERIODS_TO_REFRESH=4

REQUEST_TIMEOUT=30
REQUEST_SLEEP=0.15</code></pre><p>FMP&#8217;s Stable API uses a single base URL and authentication through an API key passed as a query parameter.</p><h3><strong>Step 2: Establish a single configuration contract</strong></h3><p>Next, we will create a <code>settings.py</code>which would help every script and module read the configuration consistently. These settings will do the following:</p><ul><li><p>load <code>.env</code></p></li><li><p>validate required values (especially <code>FMP_API_KEY</code>)</p></li><li><p>provide defaults for optional settings</p></li></ul><p>Our implementations will be looks like this:</p><pre><code># app/settings.py
import os
from dotenv import load_dotenv

# Load .env file explicitly
load_dotenv()

FMP_API_KEY = os.getenv(&#8221;FMP_API_KEY&#8221;)
if not FMP_API_KEY:
    raise RuntimeError(&#8221;Missing FMP_API_KEY. Set it as an environment variable or in .env file.&#8221;)

# Use Stable for fundamentals, V3 for historical prices (free-friendly).
FMP_STABLE_BASE_URL = os.getenv(&#8221;FMP_STABLE_BASE_URL&#8221;, &#8220;https://financialmodelingprep.com/stable&#8221;).rstrip(&#8221;/&#8221;)
FMP_V3_BASE_URL = os.getenv(&#8221;FMP_V3_BASE_URL&#8221;, &#8220;https://financialmodelingprep.com/api/v3&#8221;).rstrip(&#8221;/&#8221;)

WATCHLIST = [s.strip().upper() for s in os.getenv(&#8221;FMP_WATCHLIST&#8221;, &#8220;AAPL,MSFT,TSLA&#8221;).split(&#8221;,&#8221;) if s.strip()]

DB_PATH = os.getenv(&#8221;DB_PATH&#8221;, &#8220;data/fmp.sqlite3&#8221;)

# Daily fundamentals: fetch last N rows and upsert (simple + idempotent).
FUNDAMENTALS_PERIODS_TO_REFRESH = int(os.getenv(&#8221;FUNDAMENTALS_PERIODS_TO_REFRESH&#8221;, &#8220;4&#8221;))

REQUEST_TIMEOUT = int(os.getenv(&#8221;REQUEST_TIMEOUT&#8221;, &#8220;30&#8221;))
REQUEST_SLEEP = float(os.getenv(&#8221;REQUEST_SLEEP&#8221;, &#8220;0.15&#8221;))</code></pre><p>This becomes the project&#8217;s control plane, as if you later run the project locally, in GitHub Actions, or under a scheduler, you do not change any application code, only environment values.</p><h3><strong>Step 3: Implement a Stable API client</strong></h3><p>In this section, we will build our client script in the <code>fmp_client.py.</code>The client should be the only script that knows how to:</p><ul><li><p>build Stable URLs</p></li><li><p>attach <code>apikey=...</code></p></li><li><p>enforce timeouts and basic pacing</p></li><li><p>raise clear errors when a request fails</p></li></ul><p>The code we used will look like this:</p><pre><code>from __future__ import annotations

import os
import time
from typing import Any, Dict, Optional

import requests
from urllib3.util import Retry
from requests.adapters import HTTPAdapter

from app.settings import FMP_API_KEY, FMP_STABLE_BASE_URL, REQUEST_TIMEOUT, REQUEST_SLEEP


class FMPClient:
    &#8220;&#8221;&#8220;
    Stable-only client (current docs):
      Base URL: https://financialmodelingprep.com/stable/
      Auth: apikey=&lt;YOUR_KEY&gt;

    Stable quickstart confirms base URL + apikey query auth.
    Historical EOD endpoint lives under Stable as well.
    &#8220;&#8221;&#8220;

    def __init__(
        self,
        api_key: Optional[str] = None,
        stable_base_url: Optional[str] = None,
        v3_base_url: Optional[str] = None, 
        timeout_s: Optional[int] = None,
        sleep_s: Optional[float] = None,
        session: Optional[requests.Session] = None,
    ) -&gt; None:
        self.api_key = (api_key or FMP_API_KEY or &#8220;&#8221;).strip()
        if not self.api_key:
            raise RuntimeError(&#8221;Missing FMP_API_KEY. Set it in .env or environment variables.&#8221;)

        self.base_url = (stable_base_url or FMP_STABLE_BASE_URL or &#8220;https://financialmodelingprep.com/stable&#8221;).rstrip(&#8221;/&#8221;)
        self.timeout_s = int(timeout_s if timeout_s is not None else REQUEST_TIMEOUT)
        self.sleep_s = float(sleep_s if sleep_s is not None else REQUEST_SLEEP)
        
        self.session = session or requests.Session()
        if not session:
            # Configure retries
            retry_strategy = Retry(
                total=5,
                backoff_factor=1,
                status_forcelist=[429, 500, 502, 503, 504],
                allowed_methods=[&#8221;GET&#8221;],
                raise_on_status=True
            )
            adapter = HTTPAdapter(max_retries=retry_strategy)
            self.session.mount(&#8221;https://&#8221;, adapter)
            self.session.mount(&#8221;http://&#8221;, adapter)

    def _get_json(self, endpoint: str, params: Optional[Dict[str, Any]] = None) -&gt; Any:
        params = dict(params or {})
        params[&#8221;apikey&#8221;] = self.api_key

        url = f&#8221;{self.base_url}/{endpoint.lstrip(&#8217;/&#8217;)}&#8221;
        resp = self.session.get(url, params=params, timeout=self.timeout_s)

        if resp.status_code == 402:
            raise RuntimeError(f&#8221;FMP 402 (Restricted Endpoint) for {url}: {resp.text[:300]}&#8221;)

        if not resp.ok:
            raise RuntimeError(f&#8221;FMP error {resp.status_code} for {url}: {resp.text[:300]}&#8221;)

        if self.sleep_s &gt; 0:
            time.sleep(self.sleep_s)

        return resp.json()

    # Symbols
    def fetch_financial_statement_symbol_list(self) -&gt; Any:
        &#8220;&#8221;&#8220;/stable/financial-statement-symbol-list&#8221;&#8220;&#8221;
        return self._get_json(&#8221;financial-statement-symbol-list&#8221;)

    def fetch_profile(self, symbol: str) -&gt; Any:
        &#8220;&#8221;&#8220;/stable/profile?symbol=AAPL&#8221;&#8220;&#8221;
        return self._get_json(&#8221;profile&#8221;, {&#8221;symbol&#8221;: symbol.upper()})

    # Prices (Stable)
    def fetch_historical_price_eod_full(
        self,
        symbol: str,
        date_from: Optional[str] = None,
        date_to: Optional[str] = None,
    ) -&gt; Any:
        &#8220;&#8221;&#8220;
        Stable historical EOD (full):
          /historical-price-eod/full?symbol=AAPL
        &#8220;&#8221;&#8220;
        params: Dict[str, Any] = {&#8221;symbol&#8221;: symbol.upper()}
        if date_from:
            params[&#8221;from&#8221;] = date_from
        if date_to:
            params[&#8221;to&#8221;] = date_to
        return self._get_json(&#8221;historical-price-eod/full&#8221;, params)

    # Fundamentals (Stable)
    def fetch_income_statement(self, symbol: str) -&gt; Any:
        return self._get_json(&#8221;income-statement&#8221;, {&#8221;symbol&#8221;: symbol.upper()})

    def fetch_balance_sheet(self, symbol: str) -&gt; Any:
        return self._get_json(&#8221;balance-sheet-statement&#8221;, {&#8221;symbol&#8221;: symbol.upper()})

    def fetch_cash_flow(self, symbol: str) -&gt; Any:
        return self._get_json(&#8221;cash-flow-statement&#8221;, {&#8221;symbol&#8221;: symbol.upper()})</code></pre><p>These endpoints correspond directly to the Stable documentation for company profile, income statement, and historical EOD prices.</p><h3><strong>Step 4: define the schema and write for the data storage</strong></h3><p>In this section, we will define what we store and how we update it safely within the <code>db.py</code>file.</p><p>The code implementation will be as follows:</p><pre><code>import sqlite3
import json
from datetime import datetime
from typing import Optional, Sequence, Tuple


DDL = &#8220;&#8221;&#8220;
CREATE TABLE IF NOT EXISTS symbols (
  symbol TEXT PRIMARY KEY,
  name TEXT,
  exchange TEXT,
  currency TEXT
);

CREATE TABLE IF NOT EXISTS prices_eod (
  symbol TEXT NOT NULL,
  date TEXT NOT NULL,
  open REAL,
  high REAL,
  low REAL,
  close REAL,
  volume REAL,
  PRIMARY KEY (symbol, date)
);

CREATE TABLE IF NOT EXISTS financials (
  symbol TEXT NOT NULL,
  period_end_date TEXT NOT NULL,
  statement_type TEXT NOT NULL,
  year INTEGER,
  period TEXT,
  payload_json TEXT NOT NULL,
  PRIMARY KEY (symbol, period_end_date, statement_type)
);
&#8220;&#8221;&#8220;


def connect(db_path: str) -&gt; sqlite3.Connection:
    import os
    os.makedirs(os.path.dirname(db_path), exist_ok=True)
    conn = sqlite3.connect(db_path)
    conn.execute(&#8221;PRAGMA journal_mode=WAL;&#8221;)
    conn.execute(&#8221;PRAGMA synchronous=NORMAL;&#8221;)
    return conn


def init_db(conn: sqlite3.Connection) -&gt; None:
    conn.executescript(DDL)
    conn.commit()


def upsert_symbols(conn: sqlite3.Connection, rows: Sequence[Tuple]) -&gt; None:
    conn.executemany(
        &#8220;&#8221;&#8220;
        INSERT INTO symbols (symbol, name, exchange, currency)
        VALUES (?, ?, ?, ?)
        ON CONFLICT(symbol) DO UPDATE SET
            name=excluded.name,
            exchange=excluded.exchange,
            currency=excluded.currency
        &#8220;&#8221;&#8220;,
        rows,
    )
    conn.commit()


def upsert_prices(conn: sqlite3.Connection, rows: Sequence[Tuple]) -&gt; None:
    conn.executemany(
        &#8220;&#8221;&#8220;
        INSERT INTO prices_eod (symbol, date, open, high, low, close, volume)
        VALUES (?, ?, ?, ?, ?, ?, ?)
        ON CONFLICT(symbol, date) DO UPDATE SET
            open=excluded.open,
            high=excluded.high,
            low=excluded.low,
            close=excluded.close,
            volume=excluded.volume
        &#8220;&#8221;&#8220;,
        rows,
    )
    conn.commit()


def upsert_financials(conn: sqlite3.Connection, rows: Sequence[Tuple]) -&gt; None:
    conn.executemany(
        &#8220;&#8221;&#8220;
        INSERT INTO financials (symbol, period_end_date, statement_type, year, period, payload_json)
        VALUES (?, ?, ?, ?, ?, ?)
        ON CONFLICT(symbol, period_end_date, statement_type) DO UPDATE SET
            year=excluded.year,
            period=excluded.period,
            payload_json=excluded.payload_json
        &#8220;&#8221;&#8220;,
        rows,
    )
    conn.commit()


def read_symbols(conn: sqlite3.Connection, limit: Optional[int] = None) -&gt; list[str]:
    q = &#8220;SELECT symbol FROM symbols ORDER BY symbol&#8221;
    if limit:
        q += &#8220; LIMIT ?&#8221;
        cur = conn.execute(q, (limit,))
    else:
        cur = conn.execute(q)
    return [r[0] for r in cur.fetchall()]</code></pre><p>The code above is designed as follows:</p><ul><li><p><code>symbols</code> is the reference table</p></li><li><p><code>prices_eod</code> stores daily OHLCV keyed by <code>(symbol, date)</code></p></li><li><p><code>financials</code> stores statement rows keyed by <code>(symbol, period_end_date, statement_type)</code></p></li></ul><p>The purpose of this layer is not only persistence but also operational reliability. With primary keys and upserts in place, we can rerun backfills and daily jobs without creating duplicates.</p><h3><strong>Step 5: Convert API responses into data rows</strong></h3><p>In this section, we will define the <code>pipeline.py</code>where this script defines the ingestion rules. The script should do the following:</p><ol><li><p>normalize FMP response shapes</p></li><li><p>shape raw records into tuples that match table definitions</p></li><li><p>return those tuples so the DB layer can upsert them</p></li></ol><p>The whole code implementation is as follows:</p><pre><code>from __future__ import annotations

import json
import sqlite3
from typing import Any, Dict, Iterable, List, Optional, Tuple

from app.fmp_client import FMPClient
from app.db import upsert_symbols, upsert_prices, upsert_financials, read_symbols


def _as_list(payload: Any) -&gt; List[Dict[str, Any]]:
    &#8220;&#8221;&#8220;
    Stable endpoints typically return a JSON array.
    This helper makes the pipeline robust if the response is wrapped.
    &#8220;&#8221;&#8220;
    if isinstance(payload, list):
        return [x for x in payload if isinstance(x, dict)]
    if isinstance(payload, dict):
        for key in (&#8221;data&#8221;, &#8220;results&#8221;, &#8220;historical&#8221;):
            v = payload.get(key)
            if isinstance(v, list):
                return [x for x in v if isinstance(x, dict)]
    return []


# 1) Symbols

def seed_symbols(conn: sqlite3.Connection, client: FMPClient, symbols: Optional[Iterable[str]] = None) -&gt; int:
    &#8220;&#8221;&#8220;
    Seeds the symbols table. If symbols are provided, it enriches them via /profile.
    If none provided, it could fetch a global list (but free tier usually restricts this).
    Returns count of symbols processed.
    &#8220;&#8221;&#8220;
    if symbols:
        rows: List[Tuple] = []
        for s in symbols:
            sym = s.strip().upper()
            if not sym:
                continue
            prof = client.fetch_profile(sym)
            p = _as_list(prof)
            row = p[0] if p else {}
            
            name = row.get(&#8221;companyName&#8221;) or row.get(&#8221;name&#8221;)
            exchange = row.get(&#8221;exchange&#8221;) or row.get(&#8221;exchangeShortName&#8221;)
            currency = row.get(&#8221;currency&#8221;)
            
            rows.append((sym, name, exchange, currency))
        
        if rows:
            upsert_symbols(conn, rows)
        return len(rows)
    else:
        # Fallback to fetching a list if possible (Stable API allows financial-statement-symbol-list)
        payload = client.fetch_financial_statement_symbol_list()
        items = _as_list(payload)
        rows = []
        for r in items:
            sym = (r.get(&#8221;symbol&#8221;) or r.get(&#8221;ticker&#8221;) or &#8220;&#8221;).strip().upper()
            if not sym:
                continue
            rows.append((
                sym,
                r.get(&#8221;name&#8221;) or r.get(&#8221;companyName&#8221;),
                r.get(&#8221;exchange&#8221;) or r.get(&#8221;exchangeShortName&#8221;),
                r.get(&#8221;currency&#8221;)
            ))
        if rows:
            upsert_symbols(conn, rows)
        return len(rows)

# 2) Prices

def backfill_prices_for_symbol(
    client: FMPClient,
    symbol: str,
    date_from: Optional[str] = None,
    date_to: Optional[str] = None,
    timeseries: Optional[int] = None,  # Legacy, ignored or used as slice
) -&gt; List[Tuple]:
    &#8220;&#8221;&#8220;
    Returns rows for upsert_prices:
      (symbol, date, open, high, low, close, volume)
    &#8220;&#8221;&#8220;
    sym = symbol.strip().upper()
    payload = client.fetch_historical_price_eod_full(sym, date_from=date_from, date_to=date_to)
    bars = _as_list(payload)

    if timeseries:
        bars = bars[-int(timeseries):]

    out: List[Tuple] = []
    for b in bars:
        dt = b.get(&#8221;date&#8221;) or b.get(&#8221;datetime&#8221;) or b.get(&#8221;time&#8221;)
        if not dt:
            continue
        out.append((
            sym,
            str(dt),
            b.get(&#8221;open&#8221;),
            b.get(&#8221;high&#8221;),
            b.get(&#8221;low&#8221;),
            b.get(&#8221;close&#8221;),
            b.get(&#8221;volume&#8221;)
        ))
    return out


def ingest_prices_for_date(
    conn: sqlite3.Connection,
    client: FMPClient,
    symbols: Iterable[str],
    target_date: str
) -&gt; int:
    &#8220;&#8221;&#8220;
    Daily run: Fetch exactly one day per symbol and upsert.
    &#8220;&#8221;&#8220;
    total = 0
    for s in symbols:
        rows = backfill_prices_for_symbol(client, s, date_from=target_date, date_to=target_date)
        if rows:
            upsert_prices(conn, rows)
            total += len(rows)
    return total

# 3) Fundamentals
def refresh_fundamentals(
    conn: sqlite3.Connection,
    client: FMPClient,
    symbols: Iterable[str],
    last_n: int = 4
) -&gt; int:
    &#8220;&#8221;&#8220;
    Refreshes the latest N financial statements for a watchlist.
    &#8220;&#8221;&#8220;
    total = 0
    for s in symbols:
        sym = s.strip().upper()
        bundles = [
            (&#8221;income_statement&#8221;, client.fetch_income_statement(sym)),
            (&#8221;balance_sheet&#8221;, client.fetch_balance_sheet(sym)),
            (&#8221;cash_flow&#8221;, client.fetch_cash_flow(sym)),
        ]

        rows_to_upsert = []
        for statement_type, payload in bundles:
            rows = _as_list(payload)
            for r in rows[: int(last_n)]:
                period_end = r.get(&#8221;date&#8221;)
                if not period_end:
                    continue
                
                year = r.get(&#8221;calendarYear&#8221;) or r.get(&#8221;year&#8221;)
                period = r.get(&#8221;period&#8221;)
                
                rows_to_upsert.append((
                    sym,
                    str(period_end),
                    statement_type,
                    year,
                    period,
                    json.dumps(r, ensure_ascii=False)
                ))
        
        if rows_to_upsert:
            upsert_financials(conn, rows_to_upsert)
            total += len(rows_to_upsert)
            
    return total</code></pre><p>From there, the pipeline functions become our project lifecycle:</p><ul><li><p><strong>Symbols seeding</strong> enriches a watchlist using the profile endpoint and creates rows for <code>symbols</code>. The profile endpoint is documented with <code>symbol</code> as a required query parameter.</p></li><li><p><strong>Price backfill</strong> fetches historical EOD bars, maps each bar to <code>(symbol, date, open, high, low, close, volume)</code>, then returns rows to be upserted into <code>prices_eod</code>.</p></li><li><p><strong>Daily ingestion</strong> uses the same shaping rules but narrows the request to a single target date (typically yesterday), ensuring the daily mode is not a separate system but a constrained version of the same ingestion path.</p></li><li><p><strong>Fundamentals refresh</strong> fetches the latest statement rows and stores them under a composite key.</p></li></ul><p>The central principle is consistency for all the data we acquired from the FMP API.</p><h3><strong>Step 6: Create runnable entry points</strong></h3><p>The scripts folder exists so we can run the pipeline without writing the code each time. Each script should follow the same pattern:</p><ul><li><p>import settings</p></li><li><p>connect and initialise DB</p></li><li><p>instantiate <code>FMPClient</code></p></li><li><p>call pipeline functions</p></li><li><p>upsert results</p></li><li><p>print a concise summary</p></li></ul><p>In this project, the scripts map directly to operational phases:</p><ul><li><p><code>backfill_symbols.py</code> seeds your <code>symbols</code> table from <code>WATCHLIST</code> :</p></li></ul><pre><code>import sys
from pathlib import Path

# Add project root to sys.path
sys.path.append(str(Path(__file__).parent.parent))

from app.settings import (
    DB_PATH, FMP_API_KEY, FMP_STABLE_BASE_URL, FMP_V3_BASE_URL, WATCHLIST
)
from app.db import connect, init_db
from app.fmp_client import FMPClient
from app.pipeline import seed_symbols


def main():
    conn = connect(DB_PATH)
    init_db(conn)

    client = FMPClient(
        api_key=FMP_API_KEY,
        stable_base_url=FMP_STABLE_BASE_URL,
        v3_base_url=FMP_V3_BASE_URL,
    )

    n = seed_symbols(conn, client, WATCHLIST)
    print(f&#8221;Seeded {n} symbols into DB ({DB_PATH}) from WATCHLIST.&#8221;)


if __name__ == &#8220;__main__&#8221;:
    main()</code></pre><ul><li><p><code>backfill_prices.py</code> performs historical loading for <code>prices_eod</code></p></li></ul><pre><code>import sys
from pathlib import Path

# Add project root to sys.path
sys.path.append(str(Path(__file__).parent.parent))

import argparse

from app.settings import (
    DB_PATH, FMP_API_KEY, FMP_STABLE_BASE_URL, FMP_V3_BASE_URL, WATCHLIST
)
from app.db import connect, init_db, read_symbols, upsert_prices
from app.fmp_client import FMPClient
from app.pipeline import backfill_prices_for_symbol


def main() -&gt; None:
    ap = argparse.ArgumentParser()
    ap.add_argument(&#8221;--limit&#8221;, type=int, default=None, help=&#8221;Backfill only first N symbols from DB&#8221;)
    ap.add_argument(&#8221;--symbols&#8221;, type=str, default=None, help=&#8221;Comma-separated tickers (overrides WATCHLIST)&#8221;)

    # Optional: limit how much history you pull
    ap.add_argument(&#8221;--from-date&#8221;, type=str, default=None, help=&#8221;YYYY-MM-DD&#8221;)
    ap.add_argument(&#8221;--to-date&#8221;, type=str, default=None, help=&#8221;YYYY-MM-DD&#8221;)
    ap.add_argument(&#8221;--timeseries&#8221;, type=int, default=None, help=&#8221;Return last N days&#8221;)

    args = ap.parse_args()

    conn = connect(DB_PATH)
    init_db(conn)

    if args.symbols:
        symbols = [s.strip().upper() for s in args.symbols.split(&#8221;,&#8221;) if s.strip()]
    else:
        # Defaults to watchlist if DB is empty or use current symbols
        db_syms = read_symbols(conn, limit=args.limit)
        symbols = db_syms if db_syms else WATCHLIST

    client = FMPClient(
        api_key=FMP_API_KEY,
        stable_base_url=FMP_STABLE_BASE_URL,
        v3_base_url=FMP_V3_BASE_URL,
    )

    total_rows = 0
    for i, sym in enumerate(symbols, 1):
        rows = backfill_prices_for_symbol(
            client,
            sym,
            date_from=args.from_date,
            date_to=args.to_date,
            timeseries=args.timeseries,
        )
        if rows:
            upsert_prices(conn, rows)
            total_rows += len(rows)

        if i % 25 == 0:
            print(f&#8221;Processed {i}/{len(symbols)} symbols...&#8221;)

    print(f&#8221;Done. Upserted {total_rows} price rows.&#8221;)


if __name__ == &#8220;__main__&#8221;:
    main()</code></pre><ul><li><p><code>run_daily.py</code> runs the daily refresh (yesterday&#8217;s prices + latest fundamentals)</p></li></ul><pre><code>import sys
from pathlib import Path

# Add project root to sys.path
sys.path.append(str(Path(__file__).parent.parent))

import datetime as dt

from app.settings import (
    DB_PATH, FMP_API_KEY, FMP_STABLE_BASE_URL, FMP_V3_BASE_URL, WATCHLIST, FUNDAMENTALS_PERIODS_TO_REFRESH
)
from app.db import connect, init_db
from app.fmp_client import FMPClient
from app.pipeline import ingest_prices_for_date, refresh_fundamentals


def main():
    # Defensive check: today - 1 day
    target_date = (dt.date.today() - dt.timedelta(days=1)).isoformat()

    conn = connect(DB_PATH)
    init_db(conn)

    client = FMPClient(
        api_key=FMP_API_KEY,
        stable_base_url=FMP_STABLE_BASE_URL,
        v3_base_url=FMP_V3_BASE_URL,
    )

    n_prices = ingest_prices_for_date(conn, client, WATCHLIST, target_date)
    n_fin = refresh_fundamentals(conn, client, WATCHLIST, last_n=FUNDAMENTALS_PERIODS_TO_REFRESH)

    print(f&#8221;[{target_date}] upserted {n_prices} price rows and {n_fin} fundamentals rows.&#8221;)


if __name__ == &#8220;__main__&#8221;:
    main()</code></pre><ul><li><p><code>scheduler.py</code> runs <code>run_daily.py</code> on a local schedule and logs output</p></li></ul><pre><code>import sys
from pathlib import Path

# Add project root to sys.path
sys.path.append(str(Path(__file__).parent.parent))

import time
import schedule
import subprocess
import logging

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format=&#8217;%(asctime)s - %(levelname)s - %(message)s&#8217;,
    handlers=[
        logging.FileHandler(&#8221;data/scheduler.log&#8221;),
        logging.StreamHandler()
    ]
)

def run_job():
    logging.info(&#8221;Starting daily ingestion job...&#8221;)
    try:
        # Run run_daily.py as a subprocess
        result = subprocess.run(
            [sys.executable, &#8220;scripts/run_daily.py&#8221;],
            capture_output=True,
            text=True,
            check=True
        )
        logging.info(f&#8221;Job completed successfully:\n{result.stdout}&#8221;)
    except subprocess.CalledProcessError as e:
        logging.error(f&#8221;Job failed with error:\n{e.stderr}&#8221;)
    except Exception as e:
        logging.error(f&#8221;An unexpected error occurred: {e}&#8221;)

def main():
    # Schedule the job for 01:00 AM every day
    # You can change this time as needed
    schedule.every().day.at(&#8221;01:00&#8221;).do(run_job)
    
    logging.info(&#8221;Scheduler started. Ingestion job scheduled for 01:00 AM daily.&#8221;)
    logging.info(&#8221;Press Ctrl+C to exit.&#8221;)

    try:
        while True:
            schedule.run_pending()
            time.sleep(60) # Check every minute
    except KeyboardInterrupt:
        logging.info(&#8221;Scheduler stopped by user.&#8221;)

if __name__ == &#8220;__main__&#8221;:
    main()</code></pre><ul><li><p><code>check_db.py</code> verifies table counts, date ranges, and recent rows</p></li></ul><pre><code>import sys
from pathlib import Path

# Add project root to sys.path
sys.path.append(str(Path(__file__).parent.parent))

import sqlite3
import pandas as pd
from app.settings import DB_PATH

def main():
    print(f&#8221;Checking database at: {DB_PATH}&#8221;)
    
    con = sqlite3.connect(DB_PATH)

    try:
        print(&#8221;\n--- Row Counts ---&#8221;)
        print(pd.read_sql(&#8221;SELECT COUNT(*) AS n FROM symbols&#8221;, con))
        print(pd.read_sql(&#8221;SELECT COUNT(*) AS n FROM prices_eod&#8221;, con))
        print(pd.read_sql(&#8221;SELECT COUNT(*) AS n FROM financials&#8221;, con))

        print(&#8221;\n--- Price Statistics ---&#8221;)
        print(pd.read_sql(&#8221;SELECT MIN(date) AS min_date, MAX(date) AS max_date FROM prices_eod&#8221;, con))
        
        print(&#8221;\n--- Recent Prices (Last 5) ---&#8221;)
        print(pd.read_sql(&#8221;SELECT * FROM prices_eod ORDER BY date DESC LIMIT 5&#8221;, con))
        
        print(&#8221;\n--- Fundamentals Breakdown ---&#8221;)
        print(pd.read_sql(&#8221;SELECT statement_type, COUNT(*) AS n FROM financials GROUP BY statement_type&#8221;, con))
    except Exception as e:
        print(f&#8221;Error checking DB: {e}&#8221;)
    finally:
        con.close()

if __name__ == &#8220;__main__&#8221;:
    main()</code></pre><p>This separation keeps the project maintainable and we are able to improve the pipeline in the future.</p><h3><strong>Step 7: The database generation</strong></h3><p>The <code>data/</code> folder will contain the generated state:</p><ul><li><p><code>fmp.sqlite3</code> (Our SQLite database)</p></li><li><p><code>scheduler.log</code> (Our local scheduler audit trail, if you use it)</p></li></ul><p>Nothing in <code>data/</code> should be required for understanding the code. It is the product of running the pipeline.</p><h3><strong>Step 8: Scheduling (local or GitHub Actions)</strong></h3><p>We have two scheduling modes, which run locally or using GitHub Actions.</p><ul><li><p><strong>Local scheduling</strong> (<code>scripts/scheduler.py</code>) triggers the daily job at a fixed time and writes logs to <code>data/scheduler.log</code>. It is the simplest option when you control the machine.</p></li><li><p><strong>GitHub Actions scheduling</strong> (<code>.github/workflows/daily_ingestion.yml</code>) runs the same daily script on a cron schedule and stores the SQLite database as a workflow artifact. GitHub&#8217;s scheduled workflows are driven by cron syntax and operate in UTC. We can use the YAML file:</p></li></ul><pre><code>name: Daily Data Ingestion

on:
  schedule:
    # Runs at 02:00 UTC every day
    - cron: &#8216;0 2 * * *&#8217;
  workflow_dispatch:
    # Allows manual triggering

jobs:
  ingest:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: &#8216;3.10&#8217;

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

    - name: Run daily ingestion
      env:
        FMP_API_KEY: ${{ secrets.FMP_API_KEY }}
        FMP_WATCHLIST: ${{ vars.FMP_WATCHLIST }}
        DB_PATH: data/fmp.sqlite3
      run: |
        python scripts/run_daily.py

    - name: Upload database
      uses: actions/upload-artifact@v3
      with:
        name: fmp-database
        path: data/fmp.sqlite3</code></pre><p>The important part is that both modes execute the same <code>run_daily.py</code> entry point and therefore share the same ingestion behaviour.</p><p>That is all for the project structure that we built for the daily ingestion pipeline. In the next section, we will go through how to run them step-by-step.</p><h2><strong>Running the scripts</strong></h2><p>All operational entry points is exist in the <code>scripts</code>folder. Each script adds the project root to <code>sys.path</code>, so the recommended way to execute them is from the repository root using <code>python scripts/&lt;name&gt;.py</code>.</p><h3><strong>1. Install dependencies</strong></h3><p>From the project root:</p><pre><code>pip install -r requirements.txt</code></pre><p>This installs the minimal runtime stack (<code>requests</code>, <code>python-dotenv</code>, <code>pandas</code>, <code>schedule</code>).</p><h3><strong>2. Configure </strong><code>.env</code></h3><p>Before running anything, ensure <code>.env</code> defines at least:</p><ul><li><p><code>FMP_API_KEY</code> (Acquired the key from the <a href="https://site.financialmodelingprep.com/developer/docs">FMP site</a>)</p></li><li><p><code>FMP_WATCHLIST</code> (comma-separated tickers)</p></li><li><p><code>DB_PATH</code> (for example <code>data/fmp.sqlite3</code>)</p></li></ul><p>Our scripts read these values through <code>app/settings.py</code> and use them consistently across the pipeline.</p><h3><strong>3. Seed symbols into the database</strong></h3><p>Run the following script:</p><pre><code>python scripts/backfill_symbols.py</code></pre><p>This script connects to the SQLite database at <code>DB_PATH</code>, initializes the schema, instantiates <code>FMPClient</code>, and seeds the <code>symbols</code> table using your <code>WATCHLIST</code>.</p><p>When it completes, it prints a confirmation of the number of symbols seeded and the database file used. Something like:</p><pre><code>Done. Upserted 3768 price rows.</code></pre><h3><strong>4. Backfill historical prices</strong></h3><p>Run the following script:</p><pre><code>python scripts/backfill_prices.py</code></pre><p>This script is the one-time historical loader for <code>prices_eod</code>. It also initializes the database schema before writing. The example result is shown below:</p><pre><code>Seeded 3 symbols into DB (data/fmp.sqlite3) from WATCHLIST.</code></pre><p>Symbol selection follows the rule: if you provide <code>--symbols</code>, it uses that list; otherwise, it reads from the database and falls back to <code>WATCHLIST</code> if the database is empty.</p><p>You can keep the backfill controlled during testing or writing by using the optional arguments defined in the script:</p><pre><code># Backfill only specific tickers
python scripts/backfill_prices.py --symbols AAPL,MSFT

# Backfill only first N symbols read from the DB
python scripts/backfill_prices.py --limit 10

# Limit history by date range
python scripts/backfill_prices.py --symbols AAPL --from-date 2024-01-01 --to-date 2024-12-31

# Limit history by &#8220;last N days&#8221; returned
python scripts/backfill_prices.py --symbols AAPL --timeseries 200</code></pre><p>These flags correspond directly to the script&#8217;s argument parser (<code>--limit</code>, <code>--symbols</code>, <code>--from-date</code>, <code>--to-date</code>, <code>--timeseries</code>).</p><p>During execution, it prints progress every 25 symbols and ends with the total number of upserted price rows.</p><h3><strong>5. Run the daily ingestion job</strong></h3><p>Run the following script:</p><pre><code>python scripts/run_daily.py</code></pre><p>This is the daily operational entry point, where it computes the<code>target_date</code> as today minus one day, then performs two actions, which are price ingestion for that date and refreshes fundamentals for the watchlist. The fundamentals refresh window is controlled by <code>FUNDAMENTALS_PERIODS_TO_REFRESH</code>.</p><p>For example, the result is as following:</p><pre><code>[2026-02-13] upserted 3 price rows and 36 fundamentals rows.</code></pre><h3><strong>6. Verify what was stored in SQLite</strong></h3><p>Run the following script:</p><pre><code>python scripts/check_db.py</code></pre><p>This script is your verification tool. It prints row counts for <code>symbols</code>, <code>prices_eod</code>, and <code>financials</code>, shows min/max dates in <code>prices_eod</code>, prints the last five price rows, and summarizes fundamentals by <code>statement_type</code>.</p><p>The example result is as following:</p><pre><code>Checking database at: data/fmp.sqlite3

--- Row Counts ---
   n
0  3
      n
0  3777
    n
0  36

--- Price Statistics ---
     min_date    max_date
0  2021-02-10  2026-02-13

--- Recent Prices (Last 5) ---
  symbol        date    open    high     low   close      volume
0   AAPL  2026-02-13  262.01  262.23  255.45  255.78  54927132.0
1   MSFT  2026-02-13  404.45  405.54  398.05  401.32  33949805.0
2   TSLA  2026-02-13  414.31  424.06  410.88  417.44  50565054.0
3   AAPL  2026-02-12  275.59  275.72  260.18  261.73  81077229.0
4   MSFT  2026-02-12  405.00  406.20  398.01  401.84  40802400.0

--- Fundamentals Breakdown ---
     statement_type   n
0     balance_sheet  12
1         cash_flow  12
2  income_statement  12</code></pre><p>This script is used for a quick check after backfills or the daily job.</p><h3><strong>7. Automate the daily run</strong></h3><p>First, let&#8217;s take a look at the local scheduler, which runs on our machine:</p><p>Run the following script:</p><pre><code>python scripts/scheduler.py</code></pre><p>This schedules the job daily at<strong> </strong>01:00 AM<strong> </strong>and runs <code>scripts/run_daily.py</code> as a subprocess, writing logs to <code>data/scheduler.log</code> and to stdout.</p><h3><strong>GitHub Actions (hosted schedule)</strong></h3><p>The workflow runs at <strong>02:00 UTC </strong>daily, sets <code>FMP_API_KEY</code>, <code>FMP_WATCHLIST</code>, and <code>DB_PATH=data/fmp.sqlite3</code>, then executes <code>python scripts/run_daily.py</code> and uploads the SQLite file as an artifact. This script runs only when we push it to the GitHub repository.</p><p>That&#8217;s all you need to understand how to build the daily ingestion pipeline with FMP.</p><h2><strong>Conclusion</strong></h2><p>In this article, we have learned how to build a small but reliable daily ingestion workflow that keeps two core financial datasets current: end-of-day prices and company fundamentals.</p><p>By relying on <a href="https://site.financialmodelingprep.com/?utm_source=medium&amp;utm_medium=medium&amp;utm_campaign=corn11">Financial Modeling Prep</a>&#8217;s Stable API as the single upstream source, the pipeline remains consistent in how it authenticates, requests data, and standardizes responses, while remaining practical for routine use in research, screening, and internal analytics.</p><p>I hope it has helped!</p>]]></content:encoded></item><item><title><![CDATA[How to Build an Earnings Briefing Engine Using the FMP API]]></title><description><![CDATA[A repeatable pipeline that turns earnings prep into one-page briefs]]></description><link>https://www.nb-data.com/p/how-to-build-an-earnings-briefing</link><guid isPermaLink="false">https://www.nb-data.com/p/how-to-build-an-earnings-briefing</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Fri, 13 Feb 2026 02:21:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!O0Vg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O0Vg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O0Vg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg 424w, https://substackcdn.com/image/fetch/$s_!O0Vg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg 848w, https://substackcdn.com/image/fetch/$s_!O0Vg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!O0Vg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O0Vg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg" width="1120" height="747" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:747,&quot;width&quot;:1120,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!O0Vg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg 424w, https://substackcdn.com/image/fetch/$s_!O0Vg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg 848w, https://substackcdn.com/image/fetch/$s_!O0Vg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!O0Vg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f2d5638-c795-4b9b-9326-14bd513f3b6c_1120x747.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@jakubzerdzicki?utm_source=medium&amp;utm_medium=referral">Jakub &#379;erdzicki</a> on <a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></div><p>Earnings weeks are a compression problem. Many companies report within a short window, yet the preparation work is scattered across multiple sources.</p><p>The workflow is repetitive and time-sensitive, especially when you follow more than a few tickers. When you assemble earnings context one company at a time, you repeat the same steps for every symbol. Over time, briefings become inconsistent because each run follows a slightly different process.</p><p>This is where an earnings briefing engine becomes useful. It converts an ad hoc workflow into a repeatable pipeline, producing a consistent one-page brief for each ticker. It also makes the process easier to audit and extend over time.</p><p>In this article, we will build a minimal earnings briefing engine using <a href="https://site.financialmodelingprep.com/?utm_source=medium&amp;utm_medium=medium&amp;utm_campaign=corn10">Financial Modeling Prep&#8217;s</a> stable endpoints.</p><p>Curious about it? Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><h2><strong>Foundation</strong></h2><p>An earnings briefing engine is a compact workflow that summarizes the key information you need before an earnings event. It does not attempt to forecast returns, and it does not replace deep research. However, the purpose is more operational as it creates a briefing document we can store and reuse.</p><p>The Earnings Briefing Engine that we will build comprises three things:</p><ul><li><p>Fetch upcoming earnings events for a date window using the earnings calendar endpoint.</p></li><li><p>Build a standardized per-ticker bundle that captures event context, expectations, and recent financial performance.</p></li><li><p>Generate a consistent one-page briefing in a simple format.</p></li></ul><h2><strong>The Data Source</strong></h2><p>This project uses <a href="https://site.financialmodelingprep.com/?utm_source=medium&amp;utm_medium=medium&amp;utm_campaign=corn10">Financial Modeling Prep (FMP)</a> as the primary data source. FMP publishes an extensive catalog of financial datasets through its Stable API. The platform provides over 100 documented endpoints. It also offers additional delivery options, including WebSocket streaming and bulk downloads for selected datasets.</p><p>FMP Stable uses a simple base URL, and authentication is handled via an API key passed as a query parameter.</p><pre><code>Base URL: https://financialmodelingprep.com/stable/
Auth: apikey=&lt;YOUR_KEY&gt;</code></pre><p>This briefing engine is built around a small set of Stable endpoints. Each endpoint maps to a section in the final one-page brief:</p><ul><li><p><strong>Earnings Calendar</strong> (<code>earnings-calendar</code>) provides upcoming and past earnings events. It includes the announcement date and EPS fields when available.</p></li><li><p><strong>Analyst Estimates </strong>(<code>analyst-estimates</code>) provides forecasted revenue and EPS. This supports the market expectations section.</p></li><li><p><strong>Company Profile</strong> (<code>profile</code>) provides a company snapshot such as sector, price, and market capitalization.</p></li><li><p><strong>Income Statement </strong>(<code>income-statement</code>) provides historical statement rows for trend context.</p></li><li><p><strong>Key Metrics </strong>(<code>key-metrics</code>) provides common KPIs used for compact metric blocks.</p></li></ul><p>These are the data we will retrieve from the FMP API, and we will build the system based on it.</p><h2><strong>What the Earnings Briefing Engine Does</strong></h2><ol><li><p><strong>Pull upcoming earnings events for a date window</strong><br>The engine queries the Stable Earnings Calendar endpoint with <code>from</code> and <code>to</code>. It returns upcoming announcements and may include EPS fields when available.</p></li><li><p><strong>Extract symbols and de-duplicate</strong><br>From the calendar response, the engine extracts <code>symbol</code> values. It keeps first occurrence order and removes duplicates.</p></li><li><p><strong>Fetch a fixed dataset per symbol</strong><br>For each ticker, the engine calls a small and explicit set of endpoints: &gt;Company Profile for the snapshot context.<br>&gt;Analyst Estimates for revenue and EPS expectations.<br>&gt;Optional fundamentals endpoints, such as Income Statement and Key Metrics, when you want trend and KPI blocks.</p></li><li><p><strong>Normalize responses into a stable ticker bundle</strong><br>Each API response is mapped into a predictable internal schema. Missing datasets become empty objects or empty lists.</p></li><li><p><strong>Render a one-page briefing from the bundle</strong><br>A single renderer transforms the bundle into a consistent Markdown brief.</p></li><li><p><strong>Save outputs to disk for reuse</strong><br>Each ticker corresponds to a Markdown file in the output folder.</p></li><li><p><strong>Repeat the workflow with different inputs</strong><br>We can rerun the engine with a different date window or a watchlist.</p></li></ol><h2><strong>Project Architecture</strong></h2><p>This project stays intentionally small. The goal is a single, clear pipeline with two entry points, rather than spreading behavior across many scripts.</p><pre><code>earnings_briefing_engine/
&#9500;&#9472; app/
&#9474;  &#9500;&#9472; __init__.py
&#9474;  &#9500;&#9472; config.py            # loads API key and stable base URL once
&#9474;  &#9500;&#9472; fmp_client.py        # HTTP wrapper, apikey injection, error handling
&#9474;  &#9500;&#9472; engine.py            # calendar or watchlist &#8594; bundle &#8594; briefing orchestration
&#9474;  &#9492;&#9472; render_markdown.py   # one-page Markdown template renderer
&#9500;&#9472; output/
&#9474;  &#9492;&#9472; briefings/           # generated files, one per ticker
&#9500;&#9472; .env                    # local configuration
&#9500;&#9472; requirements.txt
&#9500;&#9472; run.py                  # upcoming earnings window mode
&#9500;&#9472; run_watchlist.py        # fixed watchlist mode
&#9492;&#9472; output.txt              # optional run log or notes</code></pre><p>Here are explanations for each of the scripts&#8217; purposes:</p><h3><strong>app/config.py</strong></h3><p>Stores the single source of truth for configuration. This includes <code>FMP_BASE_URL=https://financialmodelingprep.com/stable</code> and your API key. FMP authenticates requests by appending <code>apikey=...</code> to each request.</p><h3><strong>app/fmp_client.py</strong></h3><p>A thin client that constructs URLs, attaches <code>apikey</code>, sets timeouts, and normalizes errors. This keeps API details out of the business logic. The calling pattern follows FMP&#8217;s Stable base URL and query authentication.</p><h3><strong>app/engine.py</strong></h3><p>The orchestration layer. It runs the numbered flow defined earlier:</p><ul><li><p>In calendar mode, it calls <code>earnings-calendar</code> with <code>from</code> and <code>to</code>.</p></li><li><p>It extracts and de-duplicates symbols.</p></li><li><p>It fetches a fixed set of per-ticker datasets, then normalizes them into a stable bundle.</p></li><li><p>In watchlist mode, it can populate the event context with the per-company earnings endpoint <code>earnings</code>.</p></li><li><p>It then calls the renderer and writes output files.</p></li></ul><h3><strong>app/render_markdown.py</strong></h3><p>Converts the normalized bundle into a one-page briefing with consistent headings. Markdown is used because it is portable, diffable, and easy to store. You can add HTML or PDF later without changing the data pipeline.</p><h3><strong>output/briefings/</strong></h3><p>Holds the generated artifacts. A practical convention is one file per ticker per event date, for example <code>AAPL_2026-02-06.md</code>. This creates a durable record you can re-run and compare over time.</p><h2><strong>Building the Earnings Briefing Engine</strong></h2><p>Let&#8217;s start to build our engine. We will break it down step-by-step.</p><h3><strong>Step 1: Create the environment</strong></h3><p>Start with a virtual environment and install only what you need by filling the <code>requirements.txt</code>.</p><pre><code>requests&gt;=2.31.0
python-dotenv&gt;=1.0.0</code></pre><p>We will using the <code>requests</code> for API calls and <code>python-dotenv</code> to load secrets from <code>.env</code></p><h3><strong>Step 2: Add a </strong><code>.env</code><strong> file for configuration</strong></h3><p>Create <code>.env</code> at the project root and store:</p><pre><code>FMP_API_KEY=YOUR_API_KEY
FMP_BASE_URL=https://financialmodelingprep.com/stable</code></pre><p>The Stable base URL is the canonical starting point for the endpoints used in this tutorial.</p><h3><strong>Step 3: Load settings once in </strong><code>app/config.py</code></h3><p>Keep configuration in one place. The engine should not read environment variables inside business logic. It should receive a settings object.</p><p>The <code>config.py</code> will have the following code:</p><pre><code>import os
from dataclasses import dataclass

from dotenv import load_dotenv

load_dotenv()

@dataclass(frozen=True)
class Settings:
    api_key: str
    base_url: str
    out_dir: str = &#8220;output/briefings&#8221;


def get_settings() -&gt; Settings:
    &#8220;&#8221;&#8220;
    This project targets FMP Stable endpoints:
      https://financialmodelingprep.com/stable/...
    &#8220;&#8221;&#8220;
    api_key = os.getenv(&#8221;FMP_API_KEY&#8221;, &#8220;&#8221;).strip()
    base_url = os.getenv(&#8221;FMP_BASE_URL&#8221;, &#8220;&#8221;).strip()

    if not api_key:
        raise RuntimeError(&#8221;Missing FMP_API_KEY. Set it in your environment or in a .env file.&#8221;)

    # Default to Stable API.
    if not base_url:
        base_url = &#8220;https://financialmodelingprep.com/stable&#8221;

    # Auto-correct common misconfiguration.
    if &#8220;/api/v3&#8221; in base_url:
        base_url = &#8220;https://financialmodelingprep.com/stable&#8221;

    return Settings(api_key=api_key, base_url=base_url)</code></pre><p>In this step, we define:</p><ul><li><p><code>api_key</code></p></li><li><p><code>base_url</code></p></li><li><p><code>out_dir</code></p></li></ul><p>This aligns with the Stable API pattern and ensures consistent requests.</p><h3><strong>Step 4: Build a small in </strong><code>app/fmp_client.py</code></h3><p>Next, we will build our FMP Client using the following code:</p><pre><code>from __future__ import annotations

from dataclasses import dataclass
from typing import Any, Dict, Optional

import requests


def _redact_apikey(url: str) -&gt; str:
    if &#8220;apikey=&#8221; not in url:
        return url
    return url.split(&#8221;apikey=&#8221;)[0] + &#8220;apikey=REDACTED&#8221;


@dataclass(frozen=True)
class FmpClient:
    &#8220;&#8221;&#8220;
    Minimal HTTP client for FMP Stable endpoints.
    &#8220;&#8221;&#8220;
    api_key: str
    base_url: str = &#8220;https://financialmodelingprep.com/stable&#8221;
    timeout_s: int = 30
    max_retries: int = 2  # only for 429

    def get_json(
        self,
        path: str,
        params: Optional[Dict[str, Any]] = None,
        *,
        allow_plan_errors: bool = True,
    ) -&gt; Any:
        &#8220;&#8221;&#8220;
        If allow_plan_errors is True:
          - 402 (Payment Required) -&gt; None
          - 403 (Forbidden) -&gt; None
        &#8220;&#8221;&#8220;
        base = self.base_url.rstrip(&#8221;/&#8221;)
        url = f&#8221;{base}/{path.lstrip(&#8217;/&#8217;)}&#8221;

        q = dict(params or {})
        q[&#8221;apikey&#8221;] = self.api_key

        attempts = 0
        while True:
            attempts += 1
            resp = requests.get(url, params=q, timeout=self.timeout_s)

            if allow_plan_errors and resp.status_code in (402, 403):
                return None

            if resp.status_code == 429 and attempts &lt;= self.max_retries:
                retry_after = resp.headers.get(&#8221;Retry-After&#8221;)
                wait_s = int(retry_after) if (retry_after and retry_after.isdigit()) else (1 + attempts)
                import time
                time.sleep(wait_s)
                continue

            if resp.status_code == 401:
                raise requests.HTTPError(f&#8221;Unauthorized (401) for {_redact_apikey(resp.url)}&#8221;, response=resp)

            resp.raise_for_status()
            return resp.json()</code></pre><p>Our client should do only four things:</p><ul><li><p>Construct <code>base_url + path</code></p></li><li><p>Attach <code>apikey</code> to query parameters</p></li><li><p>Set timeouts</p></li><li><p>Normalize common errors</p></li></ul><p>FMP documents API key usage via query parameters, and also notes header-based auth as an alternative.</p><h3><strong>5. Implement the full pipeline in </strong><code>app/engine.py</code></h3><p>We will implement the whole Earnings Briefing within the <code>engine.py</code> with the code below:</p><pre><code>from __future__ import annotations

from datetime import date, timedelta
from pathlib import Path
from typing import Any, Dict, List, Optional

from app.fmp_client import FmpClient
from app.render_markdown import render_markdown


def _dedupe_keep_order(items: List[str]) -&gt; List[str]:
    seen = set()
    out: List[str] = []
    for s in items:
        s = (s or &#8220;&#8221;).strip().upper()
        if s and s not in seen:
            seen.add(s)
            out.append(s)
    return out

def _first_dict(x: Any) -&gt; Dict[str, Any]:
    return x[0] if isinstance(x, list) and x and isinstance(x[0], dict) else {}


def fetch_earnings_calendar(client: FmpClient, start: date, end: date) -&gt; List[Dict[str, Any]]:
    &#8220;&#8221;&#8220;
    Earnings Calendar (stable):
      GET /earnings-calendar?from=YYYY-MM-DD&amp;to=YYYY-MM-DD
    Docs: https://financialmodelingprep.com/stable/earnings-calendar
    &#8220;&#8221;&#8220;
    data = client.get_json(
        &#8220;earnings-calendar&#8221;,
        {&#8221;from&#8221;: start.isoformat(), &#8220;to&#8221;: end.isoformat()},
        allow_plan_errors=True,
    )
    return data or []


def fetch_profile(client: FmpClient, symbol: str) -&gt; Dict[str, Any]:
    &#8220;&#8221;&#8220;
    Company Profile (stable):
      GET /profile?symbol=SYMBOL
    Docs: https://financialmodelingprep.com/stable/profile?symbol=AAPL
    &#8220;&#8221;&#8220;
    data = client.get_json(&#8221;profile&#8221;, {&#8221;symbol&#8221;: symbol}, allow_plan_errors=True)
    return _first_dict(data)


# Optional (may be plan-limited depending on account)
def fetch_analyst_estimates(client: FmpClient, symbol: str, *, period: str = &#8220;quarter&#8221;, limit: int = 8, page: int = 0) -&gt; List[Dict[str, Any]]:
    data = client.get_json(
        &#8220;analyst-estimates&#8221;,
        {&#8221;symbol&#8221;: symbol, &#8220;period&#8221;: period, &#8220;page&#8221;: page, &#8220;limit&#8221;: limit},
        allow_plan_errors=True,
    )
    return data or []


def fetch_income_statement(client: FmpClient, symbol: str, *, period: str = &#8220;quarter&#8221;, limit: int = 8) -&gt; List[Dict[str, Any]]:
    data = client.get_json(
        &#8220;income-statement&#8221;,
        {&#8221;symbol&#8221;: symbol, &#8220;period&#8221;: period, &#8220;limit&#8221;: limit},
        allow_plan_errors=True,
    )
    return data or []


def fetch_key_metrics(client: FmpClient, symbol: str, *, period: str = &#8220;quarter&#8221;, limit: int = 8) -&gt; List[Dict[str, Any]]:
    data = client.get_json(
        &#8220;key-metrics&#8221;,
        {&#8221;symbol&#8221;: symbol, &#8220;period&#8221;: period, &#8220;limit&#8221;: limit},
        allow_plan_errors=True,
    )
    return data or []


def fetch_stock_news(client: FmpClient, symbol: str, *, limit: int = 20) -&gt; List[Dict[str, Any]]:
    data = client.get_json(&#8221;news/stock&#8221;, {&#8221;symbols&#8221;: symbol, &#8220;limit&#8221;: limit}, allow_plan_errors=True)
    return data or []


def fetch_press_releases(client: FmpClient, symbol: str, *, limit: int = 20) -&gt; List[Dict[str, Any]]:
    data = client.get_json(&#8221;news/press-releases&#8221;, {&#8221;symbols&#8221;: symbol, &#8220;limit&#8221;: limit}, allow_plan_errors=True)
    return data or []


def build_bundle(
    client: FmpClient,
    symbol: str,
    *,
    event: Optional[Dict[str, Any]] = None,
    include_estimates: bool = False,
    include_financials: bool = False,
    include_news: bool = False,
    statements_period: str = &#8220;quarter&#8221;,
    statements_limit: int = 8,
) -&gt; Dict[str, Any]:
    profile = fetch_profile(client, symbol)

    estimates: List[Dict[str, Any]] = []
    income: List[Dict[str, Any]] = []
    key_metrics: List[Dict[str, Any]] = []
    news: List[Dict[str, Any]] = []
    press: List[Dict[str, Any]] = []

    if include_estimates:
        estimates = fetch_analyst_estimates(client, symbol, period=statements_period, limit=statements_limit)

    if include_financials:
        income = fetch_income_statement(client, symbol, period=statements_period, limit=statements_limit)
        key_metrics = fetch_key_metrics(client, symbol, period=statements_period, limit=statements_limit)

    if include_news:
        news = fetch_stock_news(client, symbol)
        press = fetch_press_releases(client, symbol)

    return {
        &#8220;symbol&#8221;: symbol,
        &#8220;event&#8221;: event or {},
        &#8220;profile&#8221;: profile,
        &#8220;estimates&#8221;: estimates,
        &#8220;income&#8221;: income,
        &#8220;key_metrics&#8221;: key_metrics,
        &#8220;news&#8221;: news,
        &#8220;press&#8221;: press,
    }

def run(
    settings: Any,
    *,
    days_ahead: int = 7,
    limit: int = 10,
    symbols: Optional[List[str]] = None,
    include_estimates: bool = False,
    include_financials: bool = False,
    include_news: bool = False,
) -&gt; None:
    &#8220;&#8221;&#8220;
    Two modes:
      1) Calendar mode (default): pull upcoming earnings, then build briefs.
      2) Watchlist mode: pass symbols=[...].
    &#8220;&#8221;&#8220;
    client = FmpClient(api_key=settings.api_key, base_url=settings.base_url)

    out_dir = Path(getattr(settings, &#8220;out_dir&#8221;, &#8220;output/briefings&#8221;))
    out_dir.mkdir(parents=True, exist_ok=True)

    events_by_symbol: Dict[str, Dict[str, Any]] = {}
    if symbols:
        target_symbols = _dedupe_keep_order(symbols)
    else:
        start = date.today()
        end = start + timedelta(days=days_ahead)
        events = fetch_earnings_calendar(client, start, end)

        for e in events:
            sym = (e.get(&#8221;symbol&#8221;) or &#8220;&#8221;).strip().upper()
            if sym:
                events_by_symbol.setdefault(sym, e)

        target_symbols = list(events_by_symbol.keys())[:limit]

    if not target_symbols:
        print(
            &#8220;No symbols returned.\n&#8221;
            &#8220;Confirm your base URL is https://financialmodelingprep.com/stable and your API key is valid.\n&#8221;
            &#8220;If you are on the free tier, some datasets may be restricted.&#8221;
        )
        return

    for i, sym in enumerate(target_symbols, start=1):
        bundle = build_bundle(
            client,
            sym,
            event=events_by_symbol.get(sym),
            include_estimates=include_estimates,
            include_financials=include_financials,
            include_news=include_news,
        )
        md = render_markdown(bundle)

        out_path = out_dir / f&#8221;{sym}.md&#8221;
        out_path.write_text(md, encoding=&#8221;utf-8&#8221;)
        print(f&#8221;[{i}/{len(target_symbols)}] wrote {out_path}&#8221;)</code></pre><p>This is where the engine becomes a repeatable workflow. The code above basically does the following actions:</p><ol><li><p><strong>Pull a calendar window.</strong> Call <code>earnings-calendar</code> with <code>from</code> and <code>to</code>. This yields upcoming and past earnings events, including EPS fields when available.</p></li><li><p><strong>Extract symbols and de-duplicate. </strong>Read the <code>symbol</code> field from the calendar results. De-duplicate while preserving order. Apply a small <code>limit</code> so runs remain predictable.</p></li><li><p><strong>Fetch a fixed dataset set per ticker.</strong> Use the same calls for every symbol. Start with the essentials, then treat deeper fundamentals as optional. <code>profile?symbol=...</code> for sector and market cap style snapshot fields. <code>analyst-estimates?symbol=...&amp;period=...&amp;page=...&amp;limit=...</code> for revenue and EPS expectations. Optionally <code>income-statement</code> for trend context and<code>key-metrics</code> for compact KPI blocks.</p></li><li><p><strong>Normalize into a stable bundle schema. </strong>Map responses into one predictable shape, then pass that shape downstream. Missing datasets are represented as<code>{}</code> or <code>[]</code>. This keeps rendering stable even when some endpoints return no data on a given plan.</p></li><li><p><strong>Write one artifact per ticker. </strong>For each bundle, call the renderer and save the Markdown into <code>output/briefings/</code>.</p></li></ol><p>If you also support watchlists, you can populate event context using the per-company earnings endpoint, then reuse the same bundle and rendering path.</p><h3><strong>Step 6: Render the one-page brief in </strong><code>app/render_markdown.py</code></h3><p>Next, we set up the <code>render_markdown.py</code> with the following code:</p><pre><code>from __future__ import annotations

from datetime import datetime
from typing import Any, Dict, List, Optional

Json = Dict[str, Any]


def _first_dict(x: Any) -&gt; Json:
    if isinstance(x, list) and x and isinstance(x[0], dict):
        return x[0]
    if isinstance(x, dict):
        return x
    return {}


def _as_list_of_dicts(x: Any) -&gt; List[Json]:
    if isinstance(x, list):
        return [i for i in x if isinstance(i, dict)]
    return []


def _get_first_present(d: Json, keys: List[str], default: Any = &#8220;N/A&#8221;) -&gt; Any:
    for k in keys:
        v = d.get(k)
        if v is not None and v != &#8220;&#8221;:
            return v
    return default


def _fmt_num(x: Any) -&gt; str:
    if x is None:
        return &#8220;N/A&#8221;
    try:
        if isinstance(x, bool):
            return &#8220;N/A&#8221;
        if isinstance(x, (int, float)):
            if abs(x) &gt;= 1_000_000_000:
                return f&#8221;{x/1_000_000_000:.2f}B&#8221;
            if abs(x) &gt;= 1_000_000:
                return f&#8221;{x/1_000_000:.2f}M&#8221;
            if abs(x) &gt;= 1_000:
                return f&#8221;{x:,.0f}&#8221;
            return f&#8221;{x:.4g}&#8221;
        xf = float(str(x).replace(&#8221;,&#8221;, &#8220;&#8221;))
        return _fmt_num(xf)
    except Exception:
        return str(x)

def render_markdown(bundle: Json) -&gt; str:
    sym = bundle.get(&#8221;symbol&#8221;, &#8220;N/A&#8221;)

    event = bundle.get(&#8221;event&#8221;) or {}
    profile = bundle.get(&#8221;profile&#8221;) or {}
    estimates = _as_list_of_dicts(bundle.get(&#8221;estimates&#8221;))
    key_metrics = _as_list_of_dicts(bundle.get(&#8221;key_metrics&#8221;))
    income = _as_list_of_dicts(bundle.get(&#8221;income&#8221;))
    news = _as_list_of_dicts(bundle.get(&#8221;news&#8221;))
    press = _as_list_of_dicts(bundle.get(&#8221;press&#8221;))

    est0 = _first_dict(estimates)
    km0 = _first_dict(key_metrics)

    company = _get_first_present(profile, [&#8221;companyName&#8221;, &#8220;name&#8221;], sym)
    sector = _get_first_present(profile, [&#8221;sector&#8221;], &#8220;N/A&#8221;)

    # FIX: market cap key is commonly &#8220;marketCap&#8221; on profile payloads.
    mcap = _get_first_present(profile, [&#8221;marketCap&#8221;, &#8220;mktCap&#8221;, &#8220;marketCapitalization&#8221;], None)
    price = _get_first_present(profile, [&#8221;price&#8221;], None)

    event_date = _get_first_present(event, [&#8221;date&#8221;, &#8220;earningDate&#8221;], &#8220;N/A&#8221;)
    event_time = _get_first_present(event, [&#8221;time&#8221;, &#8220;timeEstimated&#8221;], &#8220;N/A&#8221;)

    # Prefer analyst estimates if present, otherwise fall back to the calendar row.
    eps_est = _get_first_present(est0, [&#8221;estimatedEps&#8221;, &#8220;epsEstimated&#8221;], None)
    if eps_est in (None, &#8220;N/A&#8221;):
        eps_est = _get_first_present(event, [&#8221;epsEstimated&#8221;, &#8220;estimatedEps&#8221;], None)

    rev_est = _get_first_present(est0, [&#8221;estimatedRevenue&#8221;, &#8220;revenueEstimated&#8221;], None)
    if rev_est in (None, &#8220;N/A&#8221;):
        rev_est = _get_first_present(event, [&#8221;revenueEstimated&#8221;, &#8220;estimatedRevenue&#8221;], None)

    lines: List[str] = []
    lines.append(f&#8221;# Earnings Briefing: {company} ({sym})&#8221;)
    lines.append(&#8221;&#8220;)
    lines.append(&#8221;## Event&#8221;)
    lines.append(f&#8221;- Date: {event_date}&#8221;)
    lines.append(f&#8221;- Time: {event_time}&#8221;)
    lines.append(&#8221;&#8220;)
    lines.append(&#8221;## Snapshot&#8221;)
    lines.append(f&#8221;- Sector: {sector}&#8221;)
    lines.append(f&#8221;- Price: {_fmt_num(price)}&#8221;)
    lines.append(f&#8221;- Market cap: {_fmt_num(mcap)}&#8221;)
    lines.append(&#8221;&#8220;)

    lines.append(&#8221;## Expectations&#8221;)
    lines.append(f&#8221;- Estimated EPS: {_fmt_num(eps_est)}&#8221;)
    lines.append(f&#8221;- Estimated revenue: {_fmt_num(rev_est)}&#8221;)
    lines.append(&#8221;&#8220;)

    # Only show these sections if you enabled them (or if your plan returns data).
    if km0:
        lines.append(&#8221;## Key metrics (latest)&#8221;)
        lines.append(f&#8221;- P/E: {_fmt_num(km0.get(&#8217;peRatio&#8217;))}&#8221;)
        lines.append(f&#8221;- Net margin: {_fmt_num(km0.get(&#8217;netProfitMargin&#8217;))}&#8221;)
        lines.append(&#8221;&#8220;)

    if income:
        lines.append(&#8221;## Trend context&#8221;)
        lines.append(&#8221;- Financial statements were fetched (see JSON bundle for details).&#8221;)
        lines.append(&#8221;&#8220;)

    if news or press:
        lines.append(&#8221;## Recent context&#8221;)
        if news:
            lines.append(&#8221;- Stock news:&#8221;)
            for n in news[:3]:
                title = _get_first_present(n, [&#8221;title&#8221;], None)
                pub = _get_first_present(n, [&#8221;publishedDate&#8221;, &#8220;date&#8221;], None)
                if title:
                    lines.append(f&#8221;  - {title}&#8221; + (f&#8221; ({pub})&#8221; if pub else &#8220;&#8221;))
        if press:
            lines.append(&#8221;- Press releases:&#8221;)
            for p in press[:3]:
                title = _get_first_present(p, [&#8221;title&#8221;], None)
                pub = _get_first_present(p, [&#8221;date&#8221;, &#8220;publishedDate&#8221;], None)
                if title:
                    lines.append(f&#8221;  - {title}&#8221; + (f&#8221; ({pub})&#8221; if pub else &#8220;&#8221;))
        lines.append(&#8221;&#8220;)

    lines.append(&#8221;## Questions to listen for&#8221;)
    lines.append(&#8221;- What changed in demand, pricing, or volume versus last quarter?&#8221;)
    lines.append(&#8221;- What is driving margin movement?&#8221;)
    lines.append(&#8221;- What guidance signals matter most for the next two quarters?&#8221;)
    lines.append(&#8221;&#8220;)
    lines.append(f&#8221;_Generated at {datetime.utcnow().strftime(&#8217;%Y-%m-%d %H:%M UTC&#8217;)}. Not financial advice._&#8221;)
    return &#8220;\n&#8221;.join(lines)</code></pre><p>The renderer code above takes the bundle and produces a consistent Markdown page:</p><ul><li><p>Event section uses the calendar row</p></li><li><p>Snapshot section uses profile fields</p></li><li><p>Expectations use analyst estimates, with optional fallback to calendar fields</p></li><li><p>Optional sections appear only when data exists</p></li></ul><h3><strong>Step 7: Add entry points for two run modes</strong></h3><p>We keep the entry points thin:</p><ul><li><p><code>run.py</code> for &#8220;upcoming earnings&#8221; mode. It runs the <code>earnings-calendar</code> window and generates briefings for symbols in that window. We can tweak the cide as below:</p></li></ul><pre><code>from app.config import get_settings
from app.engine import run

if __name__ == &#8220;__main__&#8221;:
    settings = get_settings()
    run(settings, days_ahead=7, limit=10)</code></pre><ul><li><p><code>run_watchlist.py</code> for &#8220;watchlist&#8221; mode. It runs the same bundle and renderer, but starts from a fixed list of symbols:</p></li></ul><pre><code>from app.config import get_settings
from app.engine import run

WATCHLIST = [&#8221;AAPL&#8221;, &#8220;MSFT&#8221;, &#8220;NVDA&#8221;, &#8220;TSLA&#8221;]

if __name__ == &#8220;__main__&#8221;:
    settings = get_settings()
    run(settings, symbols=WATCHLIST)</code></pre><p>If you want watchlist mode to always show an earnings context, you can enrich it with the per-company earnings endpoint.</p><h3><strong>Step 8: Verify outputs and iterate safely</strong></h3><p>A successful run should produce one Markdown file per ticker under <code>output/briefings/</code>. For example, the result is shown below:</p><pre><code># Earnings Briefing: Shopify Inc. (SHOP)

## Event
- Date: 2026-02-11
- Time: N/A

## Snapshot
- Sector: Technology
- Price: 112.9
- Market cap: 147.38B

## Expectations
- Estimated EPS: 0.5
- Estimated revenue: 3.59B

## Questions to listen for
- What changed in demand, pricing, or volume versus last quarter?
- What is driving margin movement?
- What guidance signals matter most for the next two quarters?

_Generated at 2026-02-05 17:11 UTC. Not financial advice._</code></pre><p>If you see missing event dates, expand the calendar window. If you see missing expectations, confirm that estimates are enabled and available for those symbols. If you hit request limits, reduce the batch size or add caching. The Basic plan call limit is published in FMP&#8217;s plan comparison</p><p>That&#8217;s all you need to know on how to build an Earnings Briefing Engine using the FMP API.</p><h2><strong>Conclusion</strong></h2><p>In this article, we have learn on how to build an earnings briefing engine that reduces manual effort during earnings weeks by enforcing a repeatable workflow.</p><p>Using <a href="https://site.financialmodelingprep.com/?utm_source=medium&amp;utm_medium=medium&amp;utm_campaign=corn10">Financial Modeling Prep (FMP)</a> as the primary data source, the process relies on a stable API to retrieve earnings events and selected supporting context, then we summarize the results into a standardized one-page briefing format that can be stored and reused.</p><p>In practice, this system will beuseful for maintaining a disciplined pre-earnings routine, supporting watchlist management during busy reporting weeks, and creating a written record of what to review before each announcement.</p><p>I hope it has helped!</p>]]></content:encoded></item><item><title><![CDATA[NBD Focus Map (Free PDF)]]></title><description><![CDATA[A simple 3-track plan to stop learning randomly and start shipping real work]]></description><link>https://www.nb-data.com/p/nbd-focus-map-free-pdf</link><guid isPermaLink="false">https://www.nb-data.com/p/nbd-focus-map-free-pdf</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sun, 01 Feb 2026 12:36:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UAOl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UAOl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UAOl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!UAOl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!UAOl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!UAOl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UAOl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:137128,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/186488912?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UAOl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!UAOl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!UAOl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!UAOl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc40430b9-c032-4f62-9956-2ee3fa2b8b22_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most people do not struggle because they lack effort. They struggle because they learn without a plan.</p><p>The Focus Map is my way of turning Non-Brand Data into a simple path you can follow. Pick one track, stick with it for 2&#8211;4 weeks, and ship one mini project at the end.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><h1><strong>What you&#8217;ll get</strong></h1><ul><li><p>SQL track for real analysis work</p></li><li><p>Python + ML track for practical modelling</p></li><li><p>RAG track for building question-answering systems on documents</p></li></ul><p>Each track includes:</p><ul><li><p>5 posts to read in order</p></li><li><p>a weekly cadence (3 sessions/week, 60 minutes/session)</p></li><li><p>One mini project with clear deliverables</p></li><li><p>What &#8220;good&#8221; looks like, so you know when to move on</p></li></ul><p>Note: the SQL Crash Course is a collaboration with Josep Ferrer (DataBites), so a few lessons are open on databites.tech.</p><p><strong>Download the PDF below for the NBD Focus Map.</strong></p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Nbd Focus Map Updated May 2026</div><div class="file-embed-details-h2">17.1KB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://www.nb-data.com/api/v1/file/ae45bb0e-6bf3-43cc-b1c3-6e10466a2839.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://www.nb-data.com/api/v1/file/ae45bb0e-6bf3-43cc-b1c3-6e10466a2839.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p></p><p>If you'd like, reply and let me know which track you&#8217;re starting with.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/nbd-focus-map-free-pdf/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/nbd-focus-map-free-pdf/comments"><span>Leave a comment</span></a></p><h2>After you finish a track</h2><p>If you finish one track and you have a notebook, repo, or write-up, you are already ahead of most people.</p><p>The next step is making it sharper and more reusable.</p><h3><strong>1) Turn it into a portfolio-ready artifact</strong></h3><p>I&#8217;m packaging a <strong>Portfolio Rubric Toolkit</strong> to help you score your project, spot what is missing, and decide what to fix first.</p><p><strong>Portfolio Rubric Toolkit (upgrade your project in 30&#8211;60 minutes):</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://cornelliusyudhawijay.gumroad.com/l/otdloq&quot;,&quot;text&quot;:&quot;Portfolio Rubric Kit&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://cornelliusyudhawijay.gumroad.com/l/otdloq"><span>Portfolio Rubric Kit</span></a></p><h3><strong>2) Keep momentum with guided paths and templates</strong></h3><p>If you prefer a more structured approach, the paid tier is built around member-only deep dives, reusable templates, and guided paths through the archive.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><p>When you finish one track, paid members can continue with the <strong><a href="https://www.nb-data.com/p/nbd-reading-vault-paid-guided-paths">Reading Vault</a></strong> and <strong><a href="https://www.nb-data.com/p/template-pack-index-paid">Template Pack Index</a></strong><a href="https://www.nb-data.com/p/template-pack-index-paid"> </a>for deeper guided paths and reusable assets.</p>]]></content:encoded></item><item><title><![CDATA[The Portfolio Rubric Data Science Hiring Managers Use]]></title><description><![CDATA[A practical hiring-manager style framework that you can use]]></description><link>https://www.nb-data.com/p/the-portfolio-rubric-data-science</link><guid isPermaLink="false">https://www.nb-data.com/p/the-portfolio-rubric-data-science</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Tue, 20 Jan 2026 10:42:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!41BT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!41BT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!41BT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!41BT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!41BT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!41BT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!41BT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg" width="1312" height="736" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:736,&quot;width&quot;:1312,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89473,&quot;alt&quot;:&quot;The Portfolio Rubric Data Science Hiring Managers Use&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/185147236?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Portfolio Rubric Data Science Hiring Managers Use" title="The Portfolio Rubric Data Science Hiring Managers Use" srcset="https://substackcdn.com/image/fetch/$s_!41BT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!41BT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!41BT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!41BT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2456a78-087d-497c-9ed0-d2da93bf2dac_1312x736.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image generated with ideogram.ai</figcaption></figure></div><p>Picture this scenario: You spend weeks polishing a Kaggle competition notebook with immaculate code, fancy plots, and a near-perfect model. You feel confident. Then, in an interview, the hiring manager asks, &#8220;How would this work with messy real data? Where is the business decision here?&#8221; You scramble for an answer. Awkward silence. The truth is that Kaggle taught you how to compete, not how to solve real business problems. In fact, &#8220;most recruiters don&#8217;t care about your Kaggle rank&#8221;. <strong>They care about something else entirely.</strong></p><p>Too many data science portfolios list projects that impress on the surface but fail to deliver real value. As a hiring manager who&#8217;s screened dozens of candidates, they have noticed a persistent gap between what candidates showcase and what teams actually need. The typical portfolio is just a list of projects, but what they are looking for is evidence of impact, realism, and critical thinking behind them.</p><p>The good news is that you don&#8217;t need a dozen fancy projects to stand out. You need the right qualities in whichever projects you present. Below, I&#8217;ll share the rubric hiring managers use to evaluate data science portfolios, which is a simple scoring framework covering the five areas that matter for hiring. <br></p><p>We&#8217;ll also look at common mistakes to avoid and quick fixes to upgrade your existing portfolio. By the end, you&#8217;ll understand exactly what hiring managers are looking for and how to demonstrate it in your portfolio.</p><p>Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Common Portfolio Mistakes (What to Avoid)</h2><p>Even experienced data scientists fall into some classic portfolio traps. Before we discuss what to do right, let&#8217;s highlight what not to do. Here are some common mistakes that cause portfolios to miss the mark:</p><ul><li><p><strong>Using Only Toy or Overused Datasets</strong>: Relying on Titanic survival predictions or Iris classification projects shows a lack of originality. Recruiters have seen these portfolios thousands of times, and a collection full of such washed-out projects will bore them. It also indicates you haven&#8217;t worked with realistic data. An industry insider said, &#8220;I hate seeing people use common Kaggle datasets like Titanic or Iris. Instead, try to scrape your own data or find unique sources.&#8221; Overall, if your data is pre-cleaned and common, it doesn&#8217;t demonstrate your ability to handle real-world data quirks.</p></li><li><p><strong>No Clear Problem or Purpose:</strong>&nbsp;Failing to define a business question or real-world purpose is a common mistake. A portfolio project like &#8220;I built a neural network to classify images&#8221; without context won&#8217;t impress hiring managers. They want to know why you did it, whether it solves a meaningful problem or was just a class assignment. If you can&#8217;t explain the problem and its significance, it shows a lack of business thinking. Many portfolios fail not due to technical skill but because they don&#8217;t communicate value. Avoid projects without a narrative of who benefits or what decisions can be made. For example, don&#8217;t say &#8220;it was a bootcamp group project&#8221; when asked why you chose it, show that you addressed a problem you care about or an issue relevant to a business.</p></li><li><p><strong>Metrics Over Impact (Model-Centric Thinking)</strong>: Many candidates focus on achieving 99% accuracy in a model and present that as the victory, but hiring managers are wary of this. Focusing on metrics instead of business value is a mistake. For example, a churn prediction model with an AUC of 94% sounds good but has little value if it mostly flags customers who no longer use the product. A narrow focus on metrics often means ignoring whether the solution solves the core problem. Employers want you to deliver value, so don&#8217;t just brag about high scores but show you understand the &#8220;so what?&#8221; of your results.</p></li><li><p><strong>Ignoring Deployment and Next Steps:</strong> A common mistake is treating projects as standalone exercises. Creating a model isn&#8217;t enough; its value lies in deployment and usage. If your projects don&#8217;t mention how to implement, use, or the next steps after building the model, hiring managers notice. Most  employers won&#8217;t consider you a serious candidate for senior employment without knowledge of deployment, retraining, or monitoring. You don&#8217;t need to be an MLOps expert, but showing deployment ideas (even hypothetical) is crucial. </p></li><li><p><strong>Poor Presentation and Communication:</strong> Many portfolios are hard to read, lacking README files, commentary, or visualizations, making it tiring for reviewers to understand your project. A hiring manager said, &#8220;I hate seeing a big mess of code with no README or TL;DR.&#8221; Without a clear summary or visual results, your work can be overlooked. Hiring managers glance through dozens of portfolios, so if yours doesn&#8217;t quickly highlight key points, it likely won&#8217;t hold attention. Another manager said, &#8220;I ignore side projects unless they show real impact... I need impact, not just some model.&#8221; Showing impact also means presenting insights simply&#8212;pictures or charts often communicate more effectively than words. Portfolios without an executive summary, well-designed graphs, or an organized story are at a disadvantage.</p></li></ul><p>Avoid these pitfalls:</p><ol><li><p>Steer clear of overly common projects,</p></li><li><p>Always define the problem and the value,</p></li><li><p>Think beyond accuracy alone,</p></li><li><p>Consider real-world deployment,</p></li><li><p>Present your work clearly. </p></li></ol><p>Next, we&#8217;ll discuss exactly what hiring managers are looking for instead and how to ensure your portfolio checks those boxes.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/the-portfolio-rubric-data-science?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/the-portfolio-rubric-data-science?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h2>What Hiring Managers Are Actually Looking For</h2><p>So what does impress a hiring manager in a data science portfolio? In a word: impact. </p><p>They want to see proof that you can apply data science to solve real problems, not just toy exercises. From my experience, this boils down to a few key qualities. Specifically, they evaluate portfolios across five dimensions that map to real on-the-job success:</p><ul><li><p><strong>Problem Framing:</strong> Did you clearly define the problem you tackled and why it matters? Great portfolios start with a well-scoped question or business problem, not just a technique. (Is it a meaningful, non-trivial problem, and do you understand the context around it?)</p></li><li><p><strong>Data Realism:</strong> Did you use data that&#8217;s reflective of real-world complexity? This includes working with messy or authentic datasets, not only pristine samples. It shows you can handle real data challenges and demonstrates curiosity in sourcing data beyond the usual examples.</p></li><li><p><strong>Evaluation Rigor:</strong> How do you measure success, and how trustworthy are your results? We look for the use of proper metrics, baseline comparisons, validation techniques, and an honest assessment of model performance. In short, are you skeptical about metrics and careful about conclusions, or are you just accepting whatever accuracy pops out?</p></li><li><p><strong>Deployment Thinking:</strong> Did you consider what happens after the model is built? That means thinking about how the solution could be deployed or used in production. For example, packaging the model, building an API, or simply discussing how a business could implement your insights. This shows a &#8220;product readiness&#8221; mindset, not just academic analysis.</p></li><li><p><strong>Communication:</strong> Could someone who isn&#8217;t you understand and appreciate the project quickly? This covers the clarity of your writing, visualization of results, and overall storytelling. Great portfolios read almost like case studies: they draw the reader in, highlight key findings, and explain technical details in an accessible way. In fact, storytelling and clear communication are becoming increasingly important. Companies want data scientists who can clearly explain insights, not just write code.</p></li></ul><p>These five categories form the Portfolio Rubric that many hiring managers use to score a portfolio. Think of each as a lens through which your project is evaluated. If your portfolio projects excel in these areas, you&#8217;re demonstrating the qualities that truly matter on the job.</p><p>In the next sections, we&#8217;ll break down each rubric category in detail. For each category, I&#8217;ll explain why it matters in real-world terms and what distinguishes an average project from an outstanding one. I&#8217;ll even provide sample scoring criteria so you can gauge where your projects might fall.</p><p>Let&#8217;s dive into the rubric that can make your portfolio a hiring manager&#8217;s dream.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/the-portfolio-rubric-data-science/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/the-portfolio-rubric-data-science/comments"><span>Leave a comment</span></a></p><div><hr></div><h2>The Portfolio Rubric: 5 Key Evaluation Categories</h2><h3>1. Problem Framing</h3><p>Problem framing is about setting the stage. It&#8217;s answering: &#8220;What exact problem are you solving, and why does it matter?&#8221; A strong portfolio project doesn&#8217;t start with &#8220;I used X algorithm&#8221;; it starts with a clear question or objective. For example, instead of &#8220;I built a time series model,&#8221; good framing would be &#8220;I forecasted weekly sales to help a retailer manage inventory,&#8221; which is a specific problem with a business context.</p><p>In industry, choosing the right problem is half the battle. Companies need data scientists who focus on impactful questions, not just cool techniques. If a project lacks context, it &#8220;only shows your lack of business thinking&#8221;. Remember, a brilliant model solving an irrelevant problem is a wasted effort. Hiring managers look for whether you understood the purpose behind the project. Did you identify a stakeholder or decision-maker, and what they care about? Do you connect your results to a business outcome or insight?</p><p>For example, a candidate&#8217;s portfolio included a project &#8220;Predicting Employee Attrition.&#8221; On paper, it was a classification model with decent accuracy. But what impressed me was the framing. They introduced it as &#8220;Employee turnover prediction to inform HR retention strategies&#8221; and discussed how reducing attrition could save money. That context turned a generic model into a compelling story of business value.</p><p>How we score it (Problem Framing):</p><ul><li><p>Level 1 (Needs Improvement): The project lacks a clear question or goal. It feels like a generic exercise (e.g., &#8220;I applied X algorithm to Y data&#8221; with no further context). The reader can&#8217;t tell what problem this solves or why it&#8217;s important.</p></li><li><p>Level 2 (Good): The project defines a problem, but in a somewhat generic way or without emphasizing its importance. There&#8217;s a basic problem statement (e.g., predicting house prices), but little discussion of who benefits or what one would do with this prediction. Some context is given, but it may be shallow or assumed.</p></li><li><p>Level 3 (Excellent): The project is framed around a specific, meaningful problem with real-world context. It&#8217;s immediately clear why the problem matters (e.g., &#8220;predicting equipment failure to reduce downtime costs&#8221;). The candidate explains the background and stakes: who has this problem, what decision the analysis will inform, and how success is defined. The scope is well-defined (not too broad or vague), showing the candidate knows how to translate an ambiguous idea into a concrete data question.</p></li></ul><div><hr></div><h3>2. Data Realism</h3><p>Data realism refers to using data and approaches that mirror real-world conditions. This means datasets that are messy, large, or obtained from authentic sources. Not just tidy CSVs everyone&#8217;s seen before. It also means demonstrating data wrangling and an understanding of data quality, rather than assuming data is perfect.</p><p>In industry, data is often messy or incomplete. Using only clean, toy datasets (like Kaggle or classroom sets) doesn&#8217;t prove you can handle real data challenges. Recruiters know anyone can run a model on Titanic or Iris; that doesn&#8217;t make you stand out. Relying on such projects may cause recruiters to ignore you, as your portfolio shows a lack of creativity. Instead, sourcing interesting datasets or demonstrating how you managed missing values, outliers, or scaling shows initiative and practical skill. A hiring manager suggests scraping your own dataset or seeking rarer datasets, rather than recycling common examples. </p><p>Imagine two candidates. Alice uses the Titanic dataset but writes as if she&#8217;s helping a cruise company improve safety, discussing the dataset's limitations (e.g., a sample of historical passengers) and how she&#8217;d gather more current data. Bob uses the Titanic dataset and just builds a classifier with 99% accuracy (on a cleaned dataset where missing ages were already handled). Alice is demonstrating data realism; Bob is not. We&#8217;re more likely to interview Alice because she&#8217;s thinking like a professional dealing with real data problems.</p><p>How we score it (Data Realism):</p><ul><li><p>Level 1 (Needs Improvement): Uses only small, common datasets with no evidence of data cleaning or exploration. It appears the data was taken &#8220;as is&#8221; from a textbook or Kaggle, with no mention of missing values, anomalies, or domain specifics. No data sourcing effort is shown (the data fell into their lap). This suggests the candidate might struggle when faced with untidy real-world data.</p></li><li><p>Level 2 (Good): Uses a reasonable dataset and shows some data cleaning or feature engineering, but nothing beyond the ordinary. The dataset might still be a common one, but the project at least acknowledges data issues (e.g., &#8220;had to handle class imbalance by ...&#8221; or &#8220;combined two data sources&#8221;). There is evidence that the candidate can do basic wrangling and is aware of data limitations, though they may not have sought out truly novel data.</p></li><li><p>Level 3 (Excellent): The project uses realistic data, possibly self-collected or multi-source. The candidate may have accessed an API, scraped data, or used an open data portal to gather new data. They clearly document the data cleaning steps and challenges (e.g., handling missing data, skewed distributions, or integrating data from different sources). The approach shows creativity in data sourcing and thoroughness in preparation. It&#8217;s evident they didn&#8217;t just accept the data at face value &#8211; they explored its quality and shaped the data to fit the problem, just like one must do on real teams. This level demonstrates that the person can handle the messiness of actual business data.</p></li></ul><div><hr></div><h3>3. Evaluation Rigor</h3><p>Evaluation rigor means critically assessing your model&#8217;s performance and results. It&#8217;s about using the right metrics, establishing baselines, properly validating the model, and interpreting the outcomes with a skeptical eye. Rigorous evaluation answers: &#8220;How do I know my solution actually works, and how well?&#8221;</p><p>In real projects, a model is only as good as the evidence that it works for the intended purpose. Hiring managers want to see that you didn&#8217;t just run to a conclusion, but that you actually tested it. This includes simple things like comparing against a baseline (e.g., how does your model compare to a naive guess or the current solution?) and using appropriate metrics for the problem (e.g., using precision/recall for a class-imbalanced problem instead of just accuracy). It also means checking for overfitting, using cross-validation or a test set, and analyzing errors or uncertainty.</p><p>Portfolios that demonstrate evaluation rigor stand out. For instance, if you built a classifier, did you also provide a confusion matrix and discuss false positives versus false negatives in context? If you did time-series forecasting, did you hold out the last few months as a true future test? If you optimized a metric, did you consider whether that metric truly reflects business success? Showing such thoroughness tells me they can trust your work. </p><p>I recall a portfolio project on image classification where the candidate not only reported accuracy but also deliberately added noise to the images to test robustness and plotted how performance dropped. They also compared their CNN to a simpler logistic regression as a baseline. This thorough evaluation was a green flag, as it demonstrated scientific thinking and honesty about the model&#8217;s capabilities.</p><p>How we score it (Evaluation Rigor):</p><ul><li><p>Level 1 (Needs Improvement): The project shows minimal evaluation. Perhaps only a single metric (like accuracy) is reported without context, or results are presented without validation (e.g., performance only on the training set or a cherry-picked example). There&#8217;s no baseline or benchmark mentioned. You can&#8217;t tell whether 90% accuracy is good or trivial, given the problem. No discussion of errors, assumptions, or limitations is present. This indicates a lack of critical thinking about the results.</p></li><li><p>Level 2 (Good): The project uses standard evaluation practices, e.g., a train/test split or cross-validation, and reports at least one appropriate metric on a held-out set. A baseline may be mentioned (e.g., &#8220;our model beats a random guess, which was 50%&#8221; or &#8220;improves over a simple linear model by 10%&#8221;). The candidate likely includes some error analysis or at least mentions possible improvements. However, the evaluation might still miss deeper issues (for example, reporting overall accuracy without noting that one class was often mispredicted, or not considering how an unbalanced dataset might skew the metric). Solid effort, but not deeply probing.</p></li><li><p>Level 3 (Excellent): The project demonstrates thorough evaluation, considering multiple performance metrics, including precision, recall, ROC, and domain-specific metrics. It establishes a clear baseline, checks for overfitting (train vs. validation curves), uses methods such as cross-validation, performs sensitivity analysis, and tests edge cases. They interpret results in context: Is the performance acceptable? (e.g., &#8220;An F1 of 0.7 means 30% issues missed, and is it acceptable in healthcare?"), and acknowledge limitations like data bias or assumptions. This rigor reflects a mindset of skepticism and decision-making focus, which we value.</p></li></ul><h3>4. Deployment Thinking</h3><p>Deployment thinking evaluates whether you considered how the project&#8217;s solution would be used in a real-world environment. In other words, did you think beyond the notebook? This could include creating a simple web app for your model, following proper coding practices to package your project, or simply writing a paragraph on how you&#8217;d deploy and monitor the model in production.</p><p>In modern data science teams, the work doesn&#8217;t stop at insight or model training. Models often need to be integrated into products or processes. While you might not personally build the entire production pipeline, you will collaborate with engineers or hand off your work for implementation. Hiring managers, therefore, value awareness of deployment considerations. If two candidates both build a decent model, but one also sets up a Flask API or describes a plan for real-time inference, that candidate demonstrates ownership and practicality. It shows they think about reliability, data pipelines, or user impact, not just modeling.</p><p>In fact, not showing any hint of deployment or next steps can be costly. As noted earlier, employers might question how you&#8217;ll add value if &#8220;you can stick your model you-know-where if it&#8217;s not usable in production&#8221;. We test for a mindset of &#8220;production readiness,&#8221; which means you anticipate the steps needed to make your work actually run and keep running in a live setting.</p><p>Consider a portfolio project that predicts stock prices. Deployment considerations might include: &#8220;I scheduled this script to run daily and send an email alert with the latest prediction.&#8221; Or &#8220;I deployed the model as an API using Streamlit so you can try it live.&#8221; Or even, &#8220;In a real company, I&#8217;d retrain this model weekly as new data comes in and monitor the prediction error over time to detect drift.&#8221; These elements turn a good project into a great one by showing you understand the full lifecycle of ML products.</p><p>How we score it (Deployment Thinking):</p><ul><li><p>Level 1 (Needs Improvement): There&#8217;s no mention of deployment or next steps. The project ends at model evaluation. It&#8217;s as if the analysis exists in isolation. There&#8217;s no consideration of how the model could be consumed (e.g., by an application or user) or maintained. The code may be very prototype-like (hard-coded paths, not modular), suggesting it&#8217;s not ready to be used elsewhere. This suggests the candidate hasn&#8217;t considered real-world implementation.</p></li><li><p>Level 2 (Good): The project shows some awareness of deployment, though it&#8217;s minimal. Perhaps the candidate structured their code well or included instructions for running the project. They might mention in passing how the model could be used (e.g., &#8220;this model could be deployed as a REST API&#8221; or &#8220;in production we&#8217;d need to retrain periodically&#8221;). There may not be an actual deployment, but there&#8217;s at least recognition of the need. Alternatively, they might have taken a small step, such as containerizing the project or using a simple dashboard to present results. It&#8217;s a hint that they know deployment is important, even if they haven&#8217;t fully demonstrated it.</p></li><li><p>Level 3 (Excellent): The project actively incorporates deployment considerations or deliverables. The candidate might have a live demo (a web app, an interactive notebook, or a command-line tool) that others can interact with. Or they provide a link to a GitHub repo with a Dockerfile and clear instructions, showing you could actually run their solution easily. They discuss how they would handle tasks such as model monitoring, data updates, scaling, and integration with existing systems. In essence, they treat the project as a product rather than just an analysis. This aligns with what many hiring managers quietly look for, which is a sense of &#8220;ownership &amp; reliability&#8221; in how you approach your work. </p></li></ul><h3>5. Communication</h3><p>Communication in a portfolio context refers to how well you convey the story and results of your project to others. This includes the organization of your content, the explanations you provide (in writing or orally if presented), the visualizations you choose, and the overall storytelling of the project. Essentially, if someone (technical or not) reviews your project, do they quickly grasp the what, why, and how of it?</p><p>Data science is a team sport, and often a business-facing one. It&#8217;s not enough to have a brilliant analysis; you must also communicate insights to colleagues, managers, or clients. Hiring managers, therefore, seek evidence of strong communication skills in your portfolio. A well-documented project with clear Markdown cells, captioned charts, and a logical flow demonstrates that you can explain your work. </p><p>In practical terms, good communication in a portfolio might mean having a README summary for each project, highlighting key results upfront, and guiding the reader through your process step by step. It also means tailoring the depth of technical detail to your audience. For example, explaining technical concepts or decisions in plain language where appropriate, and using visuals to make results intuitive. A common mistake (as we saw) is to dump a lot of code or an overly complex notebook without context. Instead, present a narrative such as what problem you tackled, what the data told you, what model you built, how well it worked, and what it means.</p><p>I once reviewed a candidate&#8217;s portfolio project on customer segmentation. They included a before-and-after chart showing how their clustering grouped customers in a new way, along with a short paragraph: &#8220;Segment 3 (orange in the chart) had the highest lifetime value but low engagement. This insight suggested a targeted re-engagement campaign for this group.&#8221; That single visualization and explanation conveyed the essence of the project&#8217;s impact. Compare that to someone who might simply say, &#8220;I did K-means clustering on customers,&#8221; and dump the cluster centers without context. The former demonstrates excellent communication and understanding of the audience&#8217;s needs.</p><p>How we score it (Communication):</p><ul><li><p>Level 1 (Needs Improvement): The project is difficult to follow. There&#8217;s little to no documentation or explanation. Perhaps the code is there, but the why behind the steps is not explained. Visualizations, if any, are poorly labeled or absent. There&#8217;s no clear introduction or conclusion. Essentially, only someone with the candidate&#8217;s exact knowledge could decipher the project. This raises concerns about how the person would communicate on a team or to stakeholders.</p></li><li><p>Level 2 (Good): The project is understandable with some effort. The candidate provides a decent structure (e.g., sections in a notebook, some comments or markdown explaining each part). They include a couple of key plots or tables and attempt to summarize findings. However, the narrative might not be as tight or engaging as it could be. Perhaps the introduction or conclusions are brief, or the visuals could be clearer. It&#8217;s adequate,  but it might not fully grab a non-expert audience or highlight the most important insights upfront.</p></li><li><p>Level 3 (Excellent): The project is structured like a compelling story or case study, starting with a brief overview of the problem and approach, then explaining the methodology step-by-step in simple terms, and concluding with clear recommendations. Visuals are used effectively to support the findings, each accompanied by a descriptive title or caption. The writing is concise, with minimal jargon or explanations, making it accessible to both technical and business audiences. Attention to design details, such as bullet points or bold highlights, emphasizes key insights. This allows reviewers to quickly grasp the main points or explore detailed reasoning, demonstrating that the candidate can communicate effectively across functions and deliver meaningful insights beyond just modeling. Ideally, the project is engaging, inspires care for the outcome, and showcases strong storytelling skills.</p></li></ul><p>Those are the five rubric categories: </p><ol><li><p>Problem Framing, </p></li><li><p>Data Realism, </p></li><li><p>Evaluation Rigor, </p></li><li><p>Deployment Thinking,</p></li><li><p>Communication. </p></li></ol><p>Great portfolios hit high marks in all five. </p><p>Next, let&#8217;s see how you can apply this rubric to improve your own portfolio, even if you&#8217;re short on time.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Non-Brand Data&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Non-Brand Data</span></a></p><div><hr></div><h2>Quick Fix: How to Upgrade Your Portfolio in 2 Hours</h2><p>You might be thinking, &#8220;This is great for planning new projects, but what about the projects I already have?&#8221; The good news is that you can improve an existing portfolio relatively quickly by addressing the rubric criteria. Here&#8217;s a step-by-step game plan (which you can literally do in an afternoon) to level up your portfolio using the rubric:</p><ol><li><p><strong>Pick Your Best Project (Focus Your Effort):</strong> If you have many projects, identify one or two that are most relevant to the roles you want or that best showcase your skills. It&#8217;s often better to have one polished, rubric-aligned case study than five mediocre ones. Hiring managers spend maybe 2-3 minutes on an initial portfolio glance, so you want your standout work front and center.</p></li><li><p><strong>Add a Clear Problem Statement:</strong> Open your project README or the top section of your notebook. Write a one-paragraph intro that answers: What problem are you solving and why should anyone care? Be specific and use plain language. For example, &#8220;Goal: Reduce customer churn by predicting which users are likely to cancel, so the marketing team can intervene with retention offers.&#8221; This immediately frames the project in terms of business value and hooks the reader.</p></li><li><p><strong>Provide Context on Data:</strong> Next, describe the dataset and why it&#8217;s appropriate (or if it has limitations). If it&#8217;s a well-known dataset, acknowledge that and perhaps note how you treated it: &#8220;We use the Telco Customer Churn dataset (IBM Sample) as a proxy for a subscription business&#8217;s customer data. In a real scenario, we&#8217;d gather recent customer activity and subscription details;  the sample data serves as a stand-in, which I augmented by adding some noise to simulate real-world imperfections.&#8221; If you did any data cleaning or feature engineering, summarize that process. This shows Data Realism. Even a sentence like &#8220;Note: I had to impute missing values for tenure and handle class imbalance (only ~26% churned) by oversampling&#8221; demonstrates that you dealt with data issues (and gets you points on the rubric).</p></li><li><p><strong>Insert a Baseline and Evaluation Highlights:</strong> Scan your results section. Have you indicated what performance you&#8217;d consider good, or what you&#8217;re comparing against? If not, add a baseline. This could be as simple as &#8220;For context, if we predict &#8216;no churn&#8217; for everyone, we&#8217;d get ~74% accuracy (the non-churn rate). Our model achieves 85% accuracy, significantly improving over this baseline.&#8221; Also, ensure you mention the key metric(s) and why they make sense: &#8220;We optimize for recall, to catch as many churning customers as possible, because missing a churning customer is costlier than a false alarm in this context.&#8221; This addition shows Evaluation Rigor and aligns your project with real decision-making. It can be done with just a few lines of text or an extra table comparing metrics.</p></li><li><p><strong>Discuss Deployment (Even Hypothetically):</strong> Add a short section titled &#8220;Deployment &amp; Next Steps&#8221; at the end. Here, write a few sentences about how this model/analysis could be used in production or what you&#8217;d do next if this were a real company project. For example: &#8220;If this model were deployed in a company, I&#8217;d set it up as a daily batch job scoring each active user. Users predicted to churn would be fed into a CRM tool for the marketing team to target. I&#8217;d also monitor the model&#8217;s precision/recall over time &#8211; if performance drifts, I&#8217;d retrain with fresh data. For real deployment, we&#8217;d need to integrate with the data warehouse and ensure predictions happen within a week of a customer&#8217;s last activity.&#8221; You don&#8217;t have to actually deploy it, but showing you understand the path to production is immensely valuable. It shows that you think like someone who wants to drive results, not just build models.</p></li><li><p><strong>Tighten the Narrative and Presentation:</strong> Now polish the communication. Ensure your notebook or report has a logical flow: Introduction &#8594; Data &#8594; Method &#8594; Results &#8594; Conclusion. Add or refine chart titles and axis labels to be more descriptive (e.g., &#8220;Churn Rate by Tenure Group&#8221; instead of &#8220;Figure1.png&#8221;). Consider adding an illustrative plot if you haven&#8217;t (for instance, a bar chart of feature importances or a sample of predictions vs. actual outcomes). Also, write a short conclusion that reiterates the key insight or performance: &#8220;Conclusion: The model can identify ~50% of churners with 80% precision, which could significantly reduce churn if retention offers are effective. The factors of contract length and monthly charges were the strongest churn predictors, aligning with business intuition.&#8221; This helps a skimmer get the point and shows you understand the results in context. Finally, if the project is on GitHub, make sure the README highlights these points and not just the technical setup.</p></li><li><p><strong>Apply the Same Steps to Other Projects (if time permits):</strong> If you have another project that&#8217;s relevant (say one NLP project and one computer vision project to showcase range), repeat the above steps there. But remember, quality over quantity. It&#8217;s better to fully refurbish one project than half-fix three of them. You want at least one example that scores high on all rubric dimensions.</p></li></ol><p>Within about 2 hours, using the steps above, you can transform a bland, academic project into a professional case study. The key is reframing your existing work to speak the language of hiring managers and to highlight business value. </p><div><hr></div><h2><strong>&#128640; Premium Content: Portfolio Rubric Toolkit (Downloadable)</strong></h2><p></p><p><em>The section below is for Premium subscribers and includes downloadable tools &amp; examples to help you implement the ideas above. Upgrade to access the full toolkit.</em> &#128640;</p>
      <p>
          <a href="https://www.nb-data.com/p/the-portfolio-rubric-data-science">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Best Financial Data APIs in 2026]]></title><description><![CDATA[A Practical Comparison to Access the Financial Information You Need]]></description><link>https://www.nb-data.com/p/best-financial-data-apis-in-2026</link><guid isPermaLink="false">https://www.nb-data.com/p/best-financial-data-apis-in-2026</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Mon, 12 Jan 2026 06:47:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BJ_q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BJ_q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BJ_q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BJ_q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BJ_q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BJ_q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BJ_q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg" width="1400" height="1013" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1013,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!BJ_q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BJ_q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BJ_q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BJ_q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b84538-cf77-4692-92e1-25faa2148cfa_1400x1013.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@campaign_creators?utm_source=medium&amp;utm_medium=referral">Campaign Creators</a> on <a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></div><p>Financial data APIs provide a direct, programmatic pathway to market information. They support a wide range of applications, including financial analytics, research workflows, automated reporting, and data-driven products. In 2026, the ecosystem is mature and competitive. Many providers offer overlapping capabilities on the surface, yet practical differences can affect implementation quality and long-term maintainability.</p><p>In practice, providers vary in their market presence and the continuity of their historical datasets. They also differ in the depth and standardization of basic data, the availability of real-time or streaming access, and the limitations imposed by rate limits. The quality of documentation, integration tools, and licensing terms also influences whether an API remains usable after initial testing. Given these differences, we need to determine which Financial data APIs best fit our needs.</p><p>In this article, we will review the best financial data APIs available in 2026. The objective is to present clear trade-offs rather than a single universal solution. For each provider, I summarize the types of data you can retrieve, the key advantages and disadvantages, and the contexts in which the API is appropriate.</p><p>Curious about it? Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2><strong>Financial Modeling Prep (FMP)</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9NRt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9NRt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png 424w, https://substackcdn.com/image/fetch/$s_!9NRt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png 848w, https://substackcdn.com/image/fetch/$s_!9NRt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png 1272w, https://substackcdn.com/image/fetch/$s_!9NRt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9NRt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png" width="1400" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Financial Data APIs in 2026" title="Best Financial Data APIs in 2026" srcset="https://substackcdn.com/image/fetch/$s_!9NRt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png 424w, https://substackcdn.com/image/fetch/$s_!9NRt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png 848w, https://substackcdn.com/image/fetch/$s_!9NRt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png 1272w, https://substackcdn.com/image/fetch/$s_!9NRt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd985b5da-22b2-48e5-8261-a609d2bb05b7_1400x809.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://site.financialmodelingprep.com/">Financial Modeling Prep (FMP)</a> is a financial data API provider that focuses on broad market coverage and practical endpoints for application development. It offers market prices and fundamental datasets through a straightforward REST interface.</p><h3><strong>Advantages</strong></h3><ul><li><p><strong>All-in-one coverage:</strong> Provides pricing data, company fundamentals, macroeconomic indicators, and market news in one place.</p></li><li><p><strong>Rich endpoint selection:</strong> Includes many ready-to-use endpoints, reducing the need for additional data stitching.</p></li><li><p><strong>Strong developer usability:</strong> Clear documentation and a predictable API structure make integration and iteration efficient.</p></li><li><p><strong>Product-oriented fit:</strong> Well-suited for building stock screeners, analytics dashboards, and research pipelines that combine price and fundamental data.</p></li></ul><h3><strong>Disadvantages</strong></h3><ul><li><p><strong>Limited free tier:</strong> The free plan is suitable for testing and light usage, but rate limits and reduced data depth limit its usefulness.</p></li><li><p><strong>Advanced access requires upgrades:</strong> Certain datasets and higher-capacity usage are reserved for higher-paid tiers.</p></li></ul><h3><strong>Best for</strong></h3><p>Teams or individuals who want a single API that can support both market data and fundamentals for analysis and product development.</p><h3><strong>Ideal starting plan</strong></h3><p>Start with the free tier to validate endpoints and data fit, then move to the entry-level paid tier once you need consistent throughput or deeper coverage.</p><div><hr></div><h2><strong>Alpha Vantage</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FWWm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FWWm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png 424w, https://substackcdn.com/image/fetch/$s_!FWWm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png 848w, https://substackcdn.com/image/fetch/$s_!FWWm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png 1272w, https://substackcdn.com/image/fetch/$s_!FWWm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FWWm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png" width="1400" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;title&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Financial Data APIs in 2026" title="Best Financial Data APIs in 2026" srcset="https://substackcdn.com/image/fetch/$s_!FWWm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png 424w, https://substackcdn.com/image/fetch/$s_!FWWm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png 848w, https://substackcdn.com/image/fetch/$s_!FWWm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png 1272w, https://substackcdn.com/image/fetch/$s_!FWWm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f963651-86f8-4ed7-9ced-4c94a08d1193_1400x720.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://www.alphavantage.co/">Alpha Vantage</a> is a comprehensive financial data API platform designed for both retail investors and institutional trading systems. It provides extensive coverage across equities, options, forex, cryptocurrencies, and macroeconomic datasets, combining real-time market feeds with deep historical data and built-in analytics.<br><br>A key differentiator is that Alpha Vantage sources data from licensed exchanges such as NASDAQ and Options Price Reporting Authority (OPRA), enabling access to professional-grade market data infrastructure through a simple API interface. With millisecond-level real-time updates and more than 20 years of historical price and fundamental data, the platform supports everything from educational projects to institutional-scale algorithmic trading systems.</p><h3><strong>Advantages</strong></h3><ul><li><p><strong>Institutional-grade data licensing:</strong> Alpha Vantage is officially licensed by major market data authorities, including NASDAQ and OPRA, ensuring reliable and compliant access to equity and options data streams. This makes it suitable for professional trading environments that require high-quality exchange-sourced data.</p></li><li><p><strong>Real-time and low-latency market data</strong><br>The platform delivers millisecond-level real-time data, enabling use cases such as algorithmic trading, quantitative research, and automated portfolio monitoring where latency and accuracy are critical.</p></li><li><p><strong>Extensive historical coverage</strong><br>Alpha Vantage offers 20+ years of historical price data across global markets, along with long-range fundamental datasets. This depth allows analysts and quantitative researchers to perform robust backtesting and long-horizon market studies.</p></li><li><p><strong>Built-in technical analysis library</strong><br>The API includes a large catalogue of technical indicators that can be retrieved directly through API calls. This significantly reduces engineering overhead for traders and developers who would otherwise need to implement indicator calculations themselves.</p></li><li><p><strong>Accessible architecture for all users</strong><br>Despite its institutional capabilities, Alpha Vantage maintains a clean, developer-friendly API structure that allows beginners, independent traders, and large trading firms to integrate financial data pipelines quickly.</p></li></ul><h3><strong>Disadvantages</strong></h3><ul><li><p><strong>Free tier constraints:</strong> similar to other providers, certain features are not included in the free tier of Alpha Vantage for compliance and anti-bot purposes.</p></li></ul><h3><strong>Best for</strong></h3><p>Alpha Vantage is particularly well-suited for:</p><ul><li><p>Retail investors and independent developers building trading tools or investment dashboards</p></li><li><p>Quantitative researchers requiring long historical datasets for backtesting</p></li><li><p>Algorithmic and institutional trading systems that need real-time exchange-licensed data feeds</p></li><li><p>Fintech platforms seeking a single API for market data, fundamentals, and analytics</p></li></ul><div><hr></div><h2><strong>EOD Historical Data (EODHD)</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OTn8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OTn8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png 424w, https://substackcdn.com/image/fetch/$s_!OTn8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png 848w, https://substackcdn.com/image/fetch/$s_!OTn8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png 1272w, https://substackcdn.com/image/fetch/$s_!OTn8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OTn8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png" width="1400" height="729" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:729,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Financial Data APIs in 2026" title="Best Financial Data APIs in 2026" srcset="https://substackcdn.com/image/fetch/$s_!OTn8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png 424w, https://substackcdn.com/image/fetch/$s_!OTn8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png 848w, https://substackcdn.com/image/fetch/$s_!OTn8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png 1272w, https://substackcdn.com/image/fetch/$s_!OTn8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdf3085f-f1be-4cc8-8c06-d281768b2150_1400x729.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://eodhd.com/">EOD Historical Data (EODHD)</a> is a market data provider known for broad international exchange coverage and long historical time series. It combines end-of-day and intraday pricing with fundamentals and several optional datasets that support more advanced workflows.</p><h3><strong>Advantages</strong></h3><ul><li><p><strong>Strong global coverage with a long history:</strong> Offers broad exchange support and historical depth suitable for long-horizon analysis and backtesting.</p></li><li><p><strong>High value on paid tiers:</strong> Paid plans are competitively priced for the amount of data provided, especially when you need global markets and deeper history.</p></li><li><p><strong>Solid fundamentals and add-ons:</strong> Includes company fundamentals and supports additional datasets such as options and macroeconomic indicators, depending on the plan.</p></li><li><p><strong>Practical integration options:</strong> Supports bulk-style access for efficient retrieval, provides some streaming capabilities, and offers spreadsheet-friendly integrations for Excel and Google Sheets.</p></li></ul><h3><strong>Disadvantages</strong></h3><ul><li><p><strong>The free tier is primarily for evaluation.</strong> Request limits are restrictive, so it is best treated as a connectivity and fit check rather than a long-term solution.</p></li><li><p><strong>Real-time depth is uneven:</strong> Real-time availability and latency can differ by asset class and region, with stronger coverage typically in U.S. markets than in many international markets.</p></li></ul><h3><strong>Best for</strong></h3><p>Projects that require global market coverage and long historical datasets, especially when you want substantial value from paid plans.</p><div><hr></div><h2><strong>Finnhub</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eynQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eynQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png 424w, https://substackcdn.com/image/fetch/$s_!eynQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png 848w, https://substackcdn.com/image/fetch/$s_!eynQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png 1272w, https://substackcdn.com/image/fetch/$s_!eynQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eynQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png" width="1400" height="548" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:548,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Financial Data APIs in 2026" title="Best Financial Data APIs in 2026" srcset="https://substackcdn.com/image/fetch/$s_!eynQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png 424w, https://substackcdn.com/image/fetch/$s_!eynQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png 848w, https://substackcdn.com/image/fetch/$s_!eynQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png 1272w, https://substackcdn.com/image/fetch/$s_!eynQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2780e149-0099-4fe5-9df6-fd68c7715d2a_1400x548.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://finnhub.io/">Finnhub</a> is a financial data API that combines market quotes with news and event-oriented datasets. It is widely used for prototyping and product development because it offers accessible pricing and a relatively broad feature set.</p><h3><strong>Advantages</strong></h3><ul><li><p><strong>Generous free-tier limits:</strong> The free plan typically provides sufficient request capacity to support meaningful experimentation and early-stage prototypes.</p></li><li><p><strong>Balanced dataset mix:</strong> Provides a practical combination of quotes, news, sentiment signals, and market calendars, helping build context-aware applications.</p></li><li><p><strong>WebSocket support:</strong> Provides streaming access through WebSockets, enabling lower-latency updates without relying exclusively on polling.</p></li></ul><h3><strong>Disadvantages</strong></h3><ul><li><p><strong>Shallower fundamentals:</strong> Fundamental coverage is generally less comprehensive than that of providers that focus heavily on financial statements and deep company datasets.</p></li><li><p><strong>Paid plans for full access:</strong> Longer historical depth and specific premium endpoints are gated behind paid tiers, particularly for more advanced or higher-volume use cases.</p></li></ul><h3><strong>Best for</strong></h3><p>Rapid prototyping and application development that benefits from combining price data with news, sentiment, and event calendars.</p><div><hr></div><h2><strong>Tiingo</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZpQT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZpQT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png 424w, https://substackcdn.com/image/fetch/$s_!ZpQT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png 848w, https://substackcdn.com/image/fetch/$s_!ZpQT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png 1272w, https://substackcdn.com/image/fetch/$s_!ZpQT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZpQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png" width="1400" height="704" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:704,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Financial Data APIs in 2026" title="Best Financial Data APIs in 2026" srcset="https://substackcdn.com/image/fetch/$s_!ZpQT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png 424w, https://substackcdn.com/image/fetch/$s_!ZpQT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png 848w, https://substackcdn.com/image/fetch/$s_!ZpQT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png 1272w, https://substackcdn.com/image/fetch/$s_!ZpQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4468ebfd-2c78-49d3-978d-671aed7b5c6b_1400x704.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://www.tiingo.com/">Tiingo</a> is a financial data provider that emphasizes clean historical market data and straightforward API access. It is commonly used in research and backtesting workflows, particularly by individual developers and small teams.</p><h3><strong>Advantages</strong></h3><ul><li><p><strong>Substantial value for individuals:</strong> Paid plans are typically affordable given the included data and request limits, making Tiingo attractive to solo builders.</p></li><li><p><strong>High-quality historical end-of-day data:</strong> Tiingo is well-regarded for stable, consistent EOD datasets that support backtesting and long-horizon analysis.</p></li><li><p><strong>Practical fundamentals for U.S. equities:</strong> On paid tiers, Tiingo provides solid fundamental coverage of U.S. companies, often sufficient for screening and basic factor research.</p></li></ul><h3><strong>Disadvantages</strong></h3><ul><li><p><strong>Less comprehensive as an all-in-one source:</strong> Tiingo is not primarily positioned as a single provider of macroeconomic data and commodities coverage so that you may need supplementary sources depending on your requirements.</p></li><li><p><strong>Real-time and intraday are not the core focus:</strong> While intraday data may be available, it is not as central or as feature-complete as providers optimized for streaming or high-frequency use cases.</p></li></ul><h3><strong>Best for</strong></h3><p>Individuals and small teams who want reliable historical market data for analysis and backtesting, with reasonable U.S. fundamentals on a cost-effective paid plan.</p><div><hr></div><h2><strong>Twelve Data</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uL_I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uL_I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png 424w, https://substackcdn.com/image/fetch/$s_!uL_I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png 848w, https://substackcdn.com/image/fetch/$s_!uL_I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png 1272w, https://substackcdn.com/image/fetch/$s_!uL_I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uL_I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png" width="1400" height="811" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:811,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Financial Data APIs in 2026" title="Best Financial Data APIs in 2026" srcset="https://substackcdn.com/image/fetch/$s_!uL_I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png 424w, https://substackcdn.com/image/fetch/$s_!uL_I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png 848w, https://substackcdn.com/image/fetch/$s_!uL_I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png 1272w, https://substackcdn.com/image/fetch/$s_!uL_I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073362e2-faa8-41fb-b680-3aac81f30c01_1400x811.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://twelvedata.com/">Twelve Data</a> is a market data API focused on time-series access across multiple asset classes. It is commonly used for applications that need consistent pricing endpoints for stocks, foreign exchange, and cryptocurrencies.</p><h3><strong>Advantages</strong></h3><ul><li><p><strong>Clean multi-asset time-series API:</strong> Provides a uniform way to retrieve historical and intraday price data across stocks, FX, and crypto, simplifying implementation.</p></li><li><p><strong>Strong developer experience:</strong> Documentation is generally clear, integration is straightforward, and common workflows are well-supported.</p></li><li><p><strong>Built-in indicators:</strong> Includes technical indicators that reduce the effort required to add analytics to a prototype or dashboard.</p></li></ul><h3><strong>Disadvantages</strong></h3><ul><li><p><strong>Paid tiers may feel expensive:</strong> Pricing can be less attractive when compared with alternatives that offer broader datasets at similar cost levels.</p></li><li><p><strong>Limited depth beyond prices:</strong> Fundamental coverage and macroeconomic datasets are typically less extensive than those from all-in-one providers.</p></li></ul><h3><strong>Best for</strong></h3><p>Projects that primarily require reliable multi-asset price time series, a developer-friendly API, and convenient technical indicators.</p><div><hr></div><h2><strong>Marketstack</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ksB7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ksB7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png 424w, https://substackcdn.com/image/fetch/$s_!ksB7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png 848w, https://substackcdn.com/image/fetch/$s_!ksB7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png 1272w, https://substackcdn.com/image/fetch/$s_!ksB7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ksB7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png" width="1400" height="725" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:725,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Financial Data APIs in 2026" title="Best Financial Data APIs in 2026" srcset="https://substackcdn.com/image/fetch/$s_!ksB7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png 424w, https://substackcdn.com/image/fetch/$s_!ksB7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png 848w, https://substackcdn.com/image/fetch/$s_!ksB7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png 1272w, https://substackcdn.com/image/fetch/$s_!ksB7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af7706e-3977-4328-a94f-6e4b2a3c71c1_1400x725.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://marketstack.com/">Marketstack</a> is a market data API focused on global equity pricing, with coverage across many stock exchanges. It is designed for simple, real-time, and historical stock price retrieval via a lightweight REST interface.</p><h3><strong>Advantages</strong></h3><ul><li><p><strong>Simple global stock pricing access:</strong> Works well when your primary need is equity quotes and historical prices across multiple markets, without complex endpoint structures.</p></li><li><p><strong>Affordable entry-level paid tier:</strong> Paid plans are typically priced for basic application use cases, making them practical for small dashboards and lightweight integrations.</p></li></ul><h3><strong>Disadvantages</strong></h3><ul><li><p><strong>Limited fundamentals and extended datasets:</strong> Marketstack is primarily price-oriented and offers fewer fundamentals, corporate datasets, and value-added endpoints than all-in-one providers.</p></li><li><p><strong>No integrated FX or crypto coverage:</strong> Foreign exchange and cryptocurrency data are not included in the core product and often require separate services.</p></li></ul><h3><strong>Best for</strong></h3><p>Basic applications that need straightforward global stock price data at a predictable cost, without strong requirements for fundamentals or multi-asset coverage.</p><div><hr></div><h2><strong>Polygon.io (Massive)</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ONPz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ONPz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png 424w, https://substackcdn.com/image/fetch/$s_!ONPz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png 848w, https://substackcdn.com/image/fetch/$s_!ONPz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png 1272w, https://substackcdn.com/image/fetch/$s_!ONPz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ONPz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png" width="1400" height="643" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:643,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Best Financial Data APIs in 2026&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Best Financial Data APIs in 2026" title="Best Financial Data APIs in 2026" srcset="https://substackcdn.com/image/fetch/$s_!ONPz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png 424w, https://substackcdn.com/image/fetch/$s_!ONPz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png 848w, https://substackcdn.com/image/fetch/$s_!ONPz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png 1272w, https://substackcdn.com/image/fetch/$s_!ONPz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c2c6eab-a483-4c4e-896a-c165fe371878_1400x643.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Overview</strong></h3><p><a href="https://massive.com/">Polygon.io</a> (now positioned under the &#8220;Massive&#8221; brand) is a market data provider focused on high-performance access to U.S. market data. It is best known for low-latency delivery, streaming support, and granular datasets suitable for trading-oriented workloads.</p><h3><strong>Advantages</strong></h3><ul><li><p><strong>Strong U.S. real-time and high-frequency coverage:</strong> Well-suited for use cases that require timely quotes and detailed market activity in U.S. equities.</p></li><li><p><strong>High performance and streaming:</strong> Provides WebSocket streaming and fast REST endpoints, which support responsive applications and real-time monitoring.</p></li><li><p><strong>Granular historical depth:</strong> With the appropriate plan, it offers tick-level history and detailed aggregates that are valuable for advanced backtesting and microstructure analysis.</p></li></ul><h3><strong>Disadvantages</strong></h3><ul><li><p><strong>U.S.-first scope:</strong> Coverage is primarily U.S.-focused, making it not the best fit for projects requiring broad global exchange coverage.</p></li><li><p><strong>Cost scales quickly for premium access:</strong> Real-time entitlements and extensive historical depth are typically available only on higher-priced tiers, which can be more expensive than general-purpose APIs.</p></li></ul><h3><strong>Best for</strong></h3><p>Trading-oriented applications that require high-performance, real-time U.S. market data and benefit from streaming and tick-level history.</p><div><hr></div><h2><strong>Conclusion</strong></h2><p>The financial data API landscape in 2026 is strong, but there is no single provider that is universally best for every scenario. The most practical approach is to select an API that matches the breadth and reliability you need, then confirm that its rate limits, historical depth, and licensing terms align with your data use.</p><p>In 2026, here are the financial data APIs you should know:</p><ul><li><p><strong>Financial Modeling Prep (FMP):</strong> A broad, all-in-one API that combines market prices with fundamentals and additional datasets for building complete financial applications.</p></li><li><p><strong>Alpha Vantage:</strong> A simple API that is well-suited for learning and small projects, especially if you want built-in technical indicators.</p></li><li><p><strong>EOD Historical Data (EODHD):</strong> A strong option for global exchange coverage and long historical datasets, with solid paid-plan value and useful add-ons.</p></li><li><p><strong>Finnhub:</strong> A developer-friendly API with generous free-tier limits and a practical mix of quotes, news, sentiment, and market calendars.</p></li><li><p><strong>Tiingo:</strong> A cost-effective choice for clean end-of-day historical data and backtesting, with good U.S. fundamentals on paid tiers.</p></li><li><p><strong>Twelve Data:</strong> A clean multi-asset time series API for stocks, FX, and crypto, designed for straightforward integration and indicator-driven workflows.</p></li><li><p><strong>Marketstack:</strong> A lightweight API for global stock price data with affordable entry pricing, best for basic applications.</p></li><li><p><strong>Polygon.io (Massive):</strong> A high-performance provider focused on real-time and high-frequency U.S. market data, including streaming and granular history.</p></li></ul><p>I hope it has helped!</p>]]></content:encoded></item><item><title><![CDATA[Batch Screening Fundamentals with Financial Modeling Prep and Streamlit]]></title><description><![CDATA[Build a Lightweight Stock Screener For Your Fundamental Analysis]]></description><link>https://www.nb-data.com/p/batch-screening-fundamentals-with</link><guid isPermaLink="false">https://www.nb-data.com/p/batch-screening-fundamentals-with</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Wed, 07 Jan 2026 14:16:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dIlI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dIlI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dIlI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dIlI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dIlI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dIlI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dIlI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dIlI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dIlI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dIlI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dIlI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd27bb505-b22a-4f85-a301-0907c2102fcb_1600x1067.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@goshua13?utm_source=medium&amp;utm_medium=referral">Joshua Aragon</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></div><p>Batch screening matters because most real-world financial workflows are not about understanding one company. They are about narrowing down a universe. In practice, you start with a watchlist, an index, or a sector set, then ask simple questions such as: which companies have strong profitability, manageable leverage, and healthy cash generation? That first pass turns an overwhelming list of tickers into a short list you can actually research.</p><p>The challenge is that screening requires repetition. If you fetch fundamentals one company at a time, you end up rewriting the same code path for every symbol: call the endpoint, parse the JSON, extract a few fields, compute ratios, and handle missing data. Doing this manually in notebooks does not scale, and it is easy to introduce inconsistencies across analyses.</p><p>In this article, we will build a small-batch screening workflow using <a href="https://site.financialmodelingprep.com/developer/docs">Financial Modeling Prep</a>&#8217;s stable fundamentals endpoints, making it work even on the free tier, pulling the data, and wrapping it in a lightweight Streamlit UI so you can screen companies interactively and export the results for deeper analysis.</p><p>Let&#8217;s get into it!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Foundation</h3><p>You can access the entire code used in this tutorial in this repository.</p><p>Batch screening is basically the process of &#8216;shortlisting&#8217; in fundamental analysis. Rather than analyzing one company at a time, you begin with a list of tickers and systematically apply the same criteria: retrieve fundamentals, calculate several ratios, filter, and rank. Performing this manually in notebooks can become repetitive and may lead to inconsistencies.</p><p>A small, structured project helps you standardize the workflow, reuse parsing and ratio logic, and produce a clean output table you can export or integrate into a dashboard.</p><p>In this article, we will build a minimal batch fundamentals screener that does three things:</p><ul><li><p><strong>Fetch</strong> the latest annual fundamentals for a list of tickers</p></li><li><p><strong>Compute</strong> simple screening metrics (for example, ROE and debt-to-equity)</p></li><li><p><strong>Filter and display</strong> the shortlist in a lightweight <strong>Streamlit UI</strong>, with an option to export results as CSV.</p></li></ul><p>This is not a complete analytics platform. It is a compact workflow you can reuse whenever you want to screen a set of companies before deeper analysis.</p><div><hr></div><h3>The Data Source</h3><p>All data comes from Financial Modeling Prep&#8217;s stable API, using a single base URL:</p><pre><code>https://financialmodelingprep.com/stable</code></pre><p>Each function is expressed as an endpoint on this base URL, with parameters passed via query strings. For this screener, we only use a small subset of endpoints focused on company fundamentals:</p><ul><li><p><strong>Income statement (</strong><code>income-statement</code><strong>)</strong>: revenue, net income, and other income statement fields</p></li><li><p><strong>Balance sheet (</strong><code>balance-sheet-statement</code><strong>)</strong>: total assets, total liabilities, and equity fields</p></li><li><p><strong>Cash flow statement (</strong><code>cash-flow-statement</code><strong>)</strong>: operating cash flow and other cash flow items</p></li></ul><p>Across these endpoints, we use consistent parameters:</p><ul><li><p><code>symbol</code>: the ticker (e.g., AAPL)</p></li><li><p><code>period</code>: <code>annual</code> (to keep the example simple and consistent)</p></li><li><p><code>limit</code>: usually <code>1</code> for &#8220;latest snapshot&#8221; screening (you can extend later to multi-year stability checks)</p></li></ul><p>These three statements are sufficient to reconstruct a basic snapshot of a company&#8217;s fundamentals and compute simple screening ratios.</p><div><hr></div><h3>What the Batch Screener Does</h3><p>Instead of exposing REST endpoints like the previous microservice, this project produces a screening table.</p><p>Given a list of tickers, it will:</p><ol><li><p>Pull the latest annual income statement, balance sheet, and cash flow statement for each ticker</p></li><li><p>Compute a few simple metrics, such as:</p></li></ol><ul><li><p><strong>ROE</strong> = netIncome / totalEquity</p></li><li><p><strong>Debt-to-Equity</strong> = totalLiabilities / totalEquity</p></li><li><p><strong>Cash flow health</strong> using operatingCashFlow (for example, requiring it to be positive)</p></li></ul><p>3. Apply thresholds to filter the universe into a shortlist</p><p>4. Display results in Streamlit and allow CSV export for follow-up analysis</p><div><hr></div><h3>Project Architecture</h3><p>We keep the project small and modular:</p><pre><code>fmp_batch_screening/
&#9500;&#9472; app/
&#9474;  &#9500;&#9472; __init__.py
&#9474;  &#9500;&#9472; config.py           # loads env vars (API key + base URL)
&#9474;  &#9500;&#9472; bulk_client.py      # fetches statements per ticker (batch via loop)
&#9474;  &#9500;&#9472; screening.py        # computes ratios + applies filters
&#9474;  &#9492;&#9472; streamlit_app.py    # Streamlit UI (inputs, sliders, table, export)
&#9500;&#9472; requirements.txt
&#9492;&#9472; .env.example</code></pre><p>At a high level, the flow is as follows:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BfHv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BfHv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png 424w, https://substackcdn.com/image/fetch/$s_!BfHv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png 848w, https://substackcdn.com/image/fetch/$s_!BfHv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png 1272w, https://substackcdn.com/image/fetch/$s_!BfHv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BfHv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png" width="1456" height="130" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:130,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BfHv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png 424w, https://substackcdn.com/image/fetch/$s_!BfHv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png 848w, https://substackcdn.com/image/fetch/$s_!BfHv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png 1272w, https://substackcdn.com/image/fetch/$s_!BfHv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8b9267c-08d5-4647-86ce-7c9f1f6e2e6d_1600x143.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div><hr></div><h3>Building the Batch Screening</h3><p>We will start building our batch screening system. We will cover</p><h4>Step 1: Define dependencies (<code>requirements.txt</code>)</h4><p>Before writing any code, we want to lock down the project dependencies. This keeps the environment reproducible and makes it easy for anyone to install and run the screener.</p><p>Create a <code>requirements.txt</code> in the project root:</p><pre><code>requests
python-dotenv
pandas
streamlit</code></pre><p>Install them in your CLI:</p><pre><code>pip install -r requirements.txt</code></pre><p>What happens here is that:</p><ul><li><p><code>requests</code> handles HTTP calls to the FMP API.</p></li><li><p><code>python-dotenv</code> loads your <code>.env</code> file into environment variables at runtime.</p></li><li><p><code>pandas</code> gives you a table structure (DataFrame) that is perfect for screening, sorting, and filtering.</p></li><li><p><code>streamlit</code> lets you turn the batch workflow into a simple UI without building a full web app.</p></li></ul><div><hr></div><h4>Step 2: Configure environment variables (<code>.env</code>)</h4><p>Next, create a <code>.env</code> file in the project root. This is where you store your API key and base URL. The goal is to keep credentials out of source code and make configuration consistent across scripts.</p><p>Create <code>.env</code>:</p><pre><code>FMP_API_KEY=your_fmp_api_key_here
FMP_BASE_URL=https://financialmodelingprep.com/stable</code></pre><p>The purpose of this is that:</p><ul><li><p><code>FMP_API_KEY</code> will be injected into each API request as <code>apikey=...</code>.</p></li><li><p><code>FMP_BASE_URL</code> becomes the single source of truth for endpoint construction.</p></li><li><p>By using a <code>.env</code>, you can switch keys or URLs without touching any code.</p></li></ul><div><hr></div><h4>Step 3: Centralize config in <code>app/config.py</code></h4><p>Instead of reading environment variables in every file, we centralise configuration in one place. This keeps the rest of the codebase clean and avoids duplication.</p><p>Create <code>app/config.py</code>:</p><pre><code>import os
from dotenv import load_dotenv

load_dotenv()
FMP_API_KEY = os.getenv(&#8221;FMP_API_KEY&#8221;)
FMP_BASE_URL = os.getenv(
    &#8220;FMP_BASE_URL&#8221;,
    &#8220;https://financialmodelingprep.com/stable&#8221;,
).rstrip(&#8221;/&#8221;)
if not FMP_API_KEY:
    raise RuntimeError(
        &#8220;FMP_API_KEY is not set. Please configure it in your .env file.&#8221;
    )</code></pre><p>Let&#8217;s break down what the code above does</p><ul><li><p><code>load_dotenv()</code> reads your <code>.env</code> file and loads all variables into the environment.</p></li><li><p><code>os.getenv("FMP_API_KEY")</code> retrieves the API key for use elsewhere.</p></li><li><p><code>FMP_BASE_URL</code> has a default fallback, and <code>.rstrip("/")</code> ensures the URL does not end with <code>/</code>.</p></li><li><p>This avoids issues like <code>.../stable//income-statement</code> when we later join paths.</p></li><li><p>The <code>RuntimeError</code> acts as an early &#8220;fail fast&#8221; check so you don&#8217;t waste time debugging missing configuration later.</p></li></ul><p>This file becomes a shared dependency across the rest of the project.</p><div><hr></div><h4>Step 4: Build the batch fundamentals fetcher (<code>app/bulk_client.py</code>)</h4><p>The &#8220;batch&#8221; problem is not about one API call. It is about applying the same extraction logic consistently across many tickers. Here, we isolate all interactions with FMP into one module that:</p><ol><li><p>fetches the latest annual statements for one ticker, then</p></li><li><p>loops across many tickers and builds a DataFrame.</p></li></ol><p>Create <code>app/bulk_client.py</code>:</p><pre><code>from typing import Any, Dict, List, Optional
import time
import requests
import pandas as pd
from app.config import FMP_API_KEY, FMP_BASE_URL

def fetch_latest_statements(symbol: str) -&gt; Dict[str, Any]:
    &#8220;&#8221;&#8220;
    Fetch the latest annual income statement, balance sheet, and cash flow
    for a single symbol using stable endpoints.
    &#8220;&#8221;&#8220;
    symbol = symbol.upper()
    def _get(endpoint: str, extra_params: Optional[Dict[str, Any]] = None) -&gt; List[Dict[str, Any]]:
        params: Dict[str, Any] = {
            &#8220;symbol&#8221;: symbol,
            &#8220;apikey&#8221;: FMP_API_KEY,
            &#8220;period&#8221;: &#8220;annual&#8221;,
            &#8220;limit&#8221;: 1,
        }
        if extra_params:
            params.update(extra_params)
        url = f&#8221;{FMP_BASE_URL}/{endpoint}&#8221;
        resp = requests.get(url, params=params, timeout=30)
        if not resp.ok:
            raise RuntimeError(
                f&#8221;FMP API error ({endpoint}) for {symbol}: &#8220;
                f&#8221;{resp.status_code} {resp.text[:200]}&#8221;
            )
        data = resp.json()
        if isinstance(data, list):
            return data
        if isinstance(data, dict):
            return [data]
        return []
    income_list = _get(&#8221;income-statement&#8221;)
    balance_list = _get(&#8221;balance-sheet-statement&#8221;)
    cashflow_list = _get(&#8221;cash-flow-statement&#8221;)
    income = income_list[0] if income_list else {}
    balance = balance_list[0] if balance_list else {}
    cashflow = cashflow_list[0] if cashflow_list else {}
    return {
        &#8220;symbol&#8221;: symbol,
        &#8220;date&#8221;: income.get(&#8221;date&#8221;) or balance.get(&#8221;date&#8221;) or cashflow.get(&#8221;date&#8221;),
        &#8220;revenue&#8221;: income.get(&#8221;revenue&#8221;),
        &#8220;netIncome&#8221;: income.get(&#8221;netIncome&#8221;),
        &#8220;totalAssets&#8221;: balance.get(&#8221;totalAssets&#8221;),
        &#8220;totalLiabilities&#8221;: balance.get(&#8221;totalLiabilities&#8221;),
        &#8220;totalEquity&#8221;: balance.get(&#8221;totalStockholdersEquity&#8221;) or balance.get(&#8221;totalEquity&#8221;),
        &#8220;operatingCashFlow&#8221;: cashflow.get(&#8221;operatingCashFlow&#8221;),
    }

def fetch_fundamentals_for_symbols(
    symbols: List[str],
    sleep_seconds: float = 0.25,
) -&gt; pd.DataFrame:
    &#8220;&#8221;&#8220;
    Loop over a list of tickers and fetch the latest annual statements for each.
    Returns one DataFrame row per symbol.
    &#8220;&#8221;&#8220;
    cleaned = [s.strip().upper() for s in symbols if s.strip()]
    cleaned = list(dict.fromkeys(cleaned))  # de-duplicate, preserve order
    rows: List[Dict[str, Any]] = []
    for sym in cleaned:
        try:
            rows.append(fetch_latest_statements(sym))
        except Exception as exc:
            print(f&#8221;[WARN] Failed for {sym}: {exc}&#8221;)
        time.sleep(sleep_seconds)
    return pd.DataFrame(rows) if rows else pd.DataFrame()</code></pre><p>This module has two layers: a single symbol and a batch loop.</p><p>1)<strong> </strong><code>fetch_latest_statements(symbol)</code></p><ul><li><p>Uppercases the ticker so <code>aapl</code> becomes <code>AAPL</code>.</p></li><li><p>Defines <code>_get(endpoint, extra_params)</code> as a local helper: <br>- Builds query parameters (<code>symbol</code>, <code>period=annual</code>, <code>limit=1</code>, plus <code>apikey</code>).<br>- Constructs the URL using the stable base: <code>f"{FMP_BASE_URL}/{endpoint}"</code>.<br>- Sends a GET request with <code>requests.get(...)</code>.<br>- If the API fails, it raises a clear error showing endpoint + status code + partial body.<br>- Normalizes responses so you always get a list of dictionaries.</p></li><li><p>Calls <code>_get(...)</code> three times:<br><code>income-statement</code> <br><code>balance-sheet-statement</code> <br><code>cash-flow-statement</code></p></li><li><p>Picks the first result from each list (because <code>limit=1</code>) and flattens only the fields we care about into a single dictionary.</p></li></ul><p>That flattening step is important: instead of returning three raw JSON blobs, we return one consistent &#8220;row&#8221; suitable for a DataFrame.</p><p>2)<strong> </strong><code>fetch_fundamentals_for_symbols(symbols)</code></p><ul><li><p>Cleans the input list:<br>- removes empty values<br>- uppercases everything<br>- de-duplicates (so you don&#8217;t waste calls)</p></li><li><p>Loops over each symbol and calls <code>fetch_latest_statements</code>.</p></li><li><p>If one symbol fails, it prints a warning but continues the batch. This matters in real screening because a single broken ticker should not halt the entire run.</p></li><li><p>Sleeps briefly between calls to reduce the chance of rate-limit issues.</p></li><li><p>Returns a DataFrame with one row per ticker.</p></li></ul><p>At this point, you&#8217;ve already converted &#8220;many API calls&#8221; into one table you can analyze.</p><div><hr></div><h4>Step 5: Compute ratios and build the screening rules (<code>app/screening.py</code>)</h4><p>Raw statements are helpful, but screening is usually based on ratios. Here, we compute a minimal set of metrics from the fetched fields and apply filters to shortlist companies.</p><p>Create <code>app/screening.py</code>:</p><pre><code>from typing import Dict, Any, Tuple, List
import pandas as pd

from app.bulk_client import fetch_fundamentals_for_symbols

DEFAULT_THRESHOLDS: Dict[str, Any] = {
    &#8220;min_roe&#8221;: 0.15,
    &#8220;max_debt_to_equity&#8221;: 0.5,
    &#8220;min_operating_cf&#8221;: 0.0,
}

def load_universe_with_ratios(symbols: List[str]) -&gt; pd.DataFrame:
    &#8220;&#8221;&#8220;
    Fetch fundamentals and compute:
      - ROE = netIncome / totalEquity
      - Debt-to-Equity = totalLiabilities / totalEquity
    &#8220;&#8221;&#8220;
    df = fetch_fundamentals_for_symbols(symbols)
    if df.empty:
        return df
    def safe_div(num, den):
        try:
            if den is None or den == 0:
                return None
            return float(num) / float(den)
        except (TypeError, ZeroDivisionError):
            return None
    df[&#8221;roe&#8221;] = [
        safe_div(ni, eq) for ni, eq in zip(df.get(&#8221;netIncome&#8221;), df.get(&#8221;totalEquity&#8221;))
    ]
    df[&#8221;debt_to_equity&#8221;] = [
        safe_div(liab, eq)
        for liab, eq in zip(df.get(&#8221;totalLiabilities&#8221;), df.get(&#8221;totalEquity&#8221;))
    ]
    return df

def apply_screen(
    df: pd.DataFrame,
    min_roe: float,
    max_debt_to_equity: float,
    min_operating_cf: float,
) -&gt; Tuple[pd.DataFrame, pd.DataFrame]:
    &#8220;&#8221;&#8220;
    Apply thresholds and return (cleaned_data, shortlist).
    &#8220;&#8221;&#8220;
    required = [&#8221;symbol&#8221;, &#8220;roe&#8221;, &#8220;debt_to_equity&#8221;, &#8220;operatingCashFlow&#8221;]
    missing = [c for c in required if c not in df.columns]
    if missing:
        return df, pd.DataFrame()
    df_clean = df.dropna(subset=required)
    mask = (
        (df_clean[&#8221;roe&#8221;] &gt;= min_roe)
        &amp; (df_clean[&#8221;debt_to_equity&#8221;] &lt;= max_debt_to_equity)
        &amp; (df_clean[&#8221;operatingCashFlow&#8221;] &gt;= min_operating_cf)
    )
    shortlist = df_clean.loc[mask].copy()
    shortlist = shortlist.sort_values(&#8221;roe&#8221;, ascending=False)
    return df_clean, shortlist</code></pre><p>Let&#8217;s break down what happens in the code above.</p><ul><li><p><code>load_universe_with_ratios(symbols)</code>:<br>- Calls the batch client to get a fundamentals DataFrame.<br>- Defines <code>safe_div()</code> so ratio calculations do not crash when equity is missing or zero.<br>Computes: <code>roe</code> from <code>netIncome / totalEquity</code> and<code>debt_to_equity</code> from <code>totalLiabilities / totalEquity</code> <br>- Adds those computed values as new DataFrame columns.</p></li><li><p><code>apply_screen(...)</code>:<br>- Verifies the required fields exist.<br>- Drops rows missing key metrics (because screening with <code>None</code> values is meaningless).<br>- Applies your filter rules (min ROE, max leverage, min operating cash flow).<br>- Sorts results by ROE so the strongest profitability appears at the top.</p></li></ul><p>This is the &#8220;brain&#8221; of the screener: you can keep extending it with more metrics later without touching the UI.</p><div><hr></div><h4>Step 6: Build the Streamlit UI (<code>app/streamlit_app.py</code>)</h4><p>Now we expose the batch screener as an interactive app. The user provides the tickers and screening thresholds, then gets a shortlist table and CSV export.</p><p>Create <code>app/streamlit_app.py</code>:</p><pre><code>
import os
import sys

# Ensure project root (parent of &#8220;app&#8221;) is on sys.path
CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
PROJECT_ROOT = os.path.dirname(CURRENT_DIR)
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

import streamlit as st
import pandas as pd

from app.screening import (
    load_universe_with_ratios,
    apply_screen,
    DEFAULT_THRESHOLDS,
)

st.set_page_config(
    page_title=&#8221;FMPFundamentals Screener&#8221;,
    layout=&#8221;wide&#8221;,
)


@st.cache_data(show_spinner=True)
def get_universe_cached(symbols: tuple) -&gt; pd.DataFrame:
    # symbols is a tuple here because cache_data needs hashable args
    return load_universe_with_ratios(list(symbols))


def main():
    st.title(&#8221;Batch Fundamentals Screener&#8221;)
    st.write(
        &#8220;This app uses only Financial Modeling Prep endpoints that are typically &#8220;
        &#8220;available on the **free plan** (annual financial statements). &#8220;
        &#8220;You provide a list of symbols, and the app fetches the latest annual &#8220;
        &#8220;income statement, balance sheet, and cash flow to compute basic ratios &#8220;
        &#8220;such as ROE and Debt-to-Equity.&#8221;
    )

    # Sidebar: symbols + criteria
    st.sidebar.header(&#8221;Universe &amp; Screening Criteria&#8221;)

    default_symbols = &#8220;AAPL, MSFT, GOOGL, AMZN, META, NVDA, TSLA, JPM, BAC, NFLX&#8221;

    symbols_input = st.sidebar.text_area(
        &#8220;Symbols (comma or newline separated)&#8221;,
        value=default_symbols,
        help=&#8221;Provide a list of tickers to screen. &#8220;
             &#8220;Example: AAPL, MSFT, GOOGL&#8221;,
        height=120,
    )

    min_roe = st.sidebar.slider(
        &#8220;Minimum ROE (Net income / Equity, latest annual)&#8221;,
        min_value=0.0,
        max_value=0.5,
        value=float(DEFAULT_THRESHOLDS[&#8221;min_roe&#8221;]),
        step=0.01,
    )

    max_debt_to_equity = st.sidebar.slider(
        &#8220;Maximum Debt-to-Equity (Total liabilities / Equity, latest annual)&#8221;,
        min_value=0.0,
        max_value=3.0,
        value=float(DEFAULT_THRESHOLDS[&#8221;max_debt_to_equity&#8221;]),
        step=0.05,
    )

    min_operating_cf = st.sidebar.number_input(
        &#8220;Minimum Operating Cash Flow (latest annual, absolute)&#8221;,
        value=float(DEFAULT_THRESHOLDS[&#8221;min_operating_cf&#8221;]),
        step=1_000_000.0,
        format=&#8221;%.0f&#8221;,
        help=&#8221;Set to &gt;0 to require positive operating cash flow.&#8221;,
    )

    st.sidebar.markdown(&#8221;---&#8221;)
    st.sidebar.write(&#8221;Edit the symbols and criteria, then click **Run Screening**.&#8221;)

    if st.button(&#8221;Run Screening&#8221;):
        # Parse symbols
        raw = symbols_input.replace(&#8221;\n&#8221;, &#8220;,&#8221;)
        symbols = [s.strip().upper() for s in raw.split(&#8221;,&#8221;) if s.strip()]

        if not symbols:
            st.warning(&#8221;Please provide at least one symbol.&#8221;)
            return

        try:
            df_universe = get_universe_cached(tuple(symbols))
        except Exception as e:
            st.error(f&#8221;Error fetching data from FMP: {e}&#8221;)
            return

        if df_universe.empty:
            st.warning(&#8221;No data returned from the financial statement endpoints.&#8221;)
            return

        st.subheader(&#8221;Universe Preview&#8221;)
        st.write(
            f&#8221;Fetched latest annual statements for **{len(df_universe)}** symbols.&#8221;
        )

        st.write(&#8221;Columns available (first 20):&#8221;)
        st.code(&#8221;, &#8220;.join(df_universe.columns.tolist()[:20]), language=&#8221;text&#8221;)

        df_all, df_screened = apply_screen(
            df_universe,
            min_roe=min_roe,
            max_debt_to_equity=max_debt_to_equity,
            min_operating_cf=min_operating_cf,
        )

        if df_screened.empty:
            st.warning(
                &#8220;No companies passed the current screening rules. &#8220;
                &#8220;Try relaxing the filters or inspect the raw data.&#8221;
            )
            with st.expander(&#8221;Show full dataset&#8221;):
                st.dataframe(df_all)
            return

        st.subheader(&#8221;Screening Results&#8221;)
        st.write(
            f&#8221;Companies passing the screen: **{len(df_screened)}**. &#8220;
            &#8220;Sorted by ROE descending.&#8221;
        )

        display_cols = [c for c in [            &#8220;symbol&#8221;,            &#8220;date&#8221;,            &#8220;roe&#8221;,            &#8220;debt_to_equity&#8221;,            &#8220;operatingCashFlow&#8221;,            &#8220;revenue&#8221;,            &#8220;netIncome&#8221;,            &#8220;totalAssets&#8221;,            &#8220;totalLiabilities&#8221;,            &#8220;totalEquity&#8221;,        ] if c in df_screened.columns]

        st.dataframe(df_screened[display_cols].reset_index(drop=True))

        # Simple bar chart of top N by ROE
        top_n = min(20, len(df_screened))
        chart_df = df_screened.head(top_n)
        if &#8220;symbol&#8221; in chart_df.columns and &#8220;roe&#8221; in chart_df.columns:
            st.subheader(f&#8221;Top {top_n} by ROE (latest annual)&#8221;)
            st.bar_chart(
                chart_df.set_index(&#8221;symbol&#8221;)[&#8221;roe&#8221;]
            )

        with st.expander(&#8221;Download results as CSV&#8221;):
            csv = df_screened.to_csv(index=False)
            st.download_button(
                label=&#8221;Download CSV&#8221;,
                data=csv,
                file_name=&#8221;screened_companies.csv&#8221;,
                mime=&#8221;text/csv&#8221;,
            )

    else:
        st.info(&#8221;Provide symbols in the sidebar and click **Run Screening**.&#8221;)


if __name__ == &#8220;__main__&#8221;:
    main()</code></pre><p>What happen in our Streamlit UI is:</p><p>Let&#8217;s break down what happens in the code above.</p><ul><li><p>The <code>sys.path</code> block ensures imports like <code>from app.screening import ...</code> work correctly in Streamlit (because Streamlit executes the file as a script).</p></li><li><p>The sidebar captures two things:<br>- a user-defined ticker list<br>- screening thresholds (ROE, debt-to-equity, operating cash flow)</p></li><li><p><code>@st.cache_data</code> caches results for the same ticker list:<br>- if you adjust only the thresholds, Streamlit reuses the fetched data instead of calling the API again</p></li><li><p>When you click Run Screening, the app:<br>- parses tickers into a clean list<br>- fetches fundamentals and builds a DataFrame<br>- computes ratios<br>- applies screening rules<br>- renders the shortlist table and provides CSV export (plus a simple ROE chart)</p></li></ul><div><hr></div><h4>Step 7: Run the app</h4><p>From the project root:</p><pre><code>streamlit run app/streamlit_app.py</code></pre><p>Once it runs, you can:</p><ul><li><p>paste your own universe of tickers,</p></li><li><p>adjust thresholds,</p></li><li><p>export a shortlist for deeper analysis.</p></li></ul><p>That is how we run the batch screening UI we just created. Let&#8217;s take a look at the system we just created by accessing it via localhost. If everything runs fine, you will see the screen something like below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qEns!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qEns!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png 424w, https://substackcdn.com/image/fetch/$s_!qEns!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png 848w, https://substackcdn.com/image/fetch/$s_!qEns!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png 1272w, https://substackcdn.com/image/fetch/$s_!qEns!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qEns!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png" width="1456" height="773" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:773,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qEns!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png 424w, https://substackcdn.com/image/fetch/$s_!qEns!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png 848w, https://substackcdn.com/image/fetch/$s_!qEns!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png 1272w, https://substackcdn.com/image/fetch/$s_!qEns!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f253170-a7ec-41c6-8f32-de1d8c60823f_1600x849.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On the left side, you can enter all the information for the screening criteria, while on the right side is where all the information appears after we run the screening.</p><p>The result is the preview of the data universe we acquired and the screening results.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xB7d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xB7d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png 424w, https://substackcdn.com/image/fetch/$s_!xB7d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png 848w, https://substackcdn.com/image/fetch/$s_!xB7d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png 1272w, https://substackcdn.com/image/fetch/$s_!xB7d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xB7d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png" width="1456" height="862" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:862,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xB7d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png 424w, https://substackcdn.com/image/fetch/$s_!xB7d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png 848w, https://substackcdn.com/image/fetch/$s_!xB7d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png 1272w, https://substackcdn.com/image/fetch/$s_!xB7d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd263f19-0d77-47a5-ab4c-cbd069bcaadb_1600x947.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As additional features, we have a chart showing the company&#8217;s ROE that passes the screening and a button to download the results as CSV files.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N59-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N59-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png 424w, https://substackcdn.com/image/fetch/$s_!N59-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png 848w, https://substackcdn.com/image/fetch/$s_!N59-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png 1272w, https://substackcdn.com/image/fetch/$s_!N59-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N59-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N59-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png 424w, https://substackcdn.com/image/fetch/$s_!N59-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png 848w, https://substackcdn.com/image/fetch/$s_!N59-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png 1272w, https://substackcdn.com/image/fetch/$s_!N59-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F433cc58c-9e6f-4f9e-9477-d02025d7266a_1600x924.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That&#8217;s all you need to know to build our batch screening in FMP. You can always extend the metrics and add additional information you need.</p><div><hr></div><h3>Conclusion</h3><p>In this article, we built a lightweight batch fundamentals screener on top of Financial Modeling Prep&#8217;s stable API to analyze many companies within a single, consistent workflow.</p><p>By combining a small data-fetching layer, simple ratio calculations (such as ROE and debt-to-equity), and a Streamlit interface, we can quickly turn a list of tickers into a shortlist that is easy to review and export.</p><p>You can use this project as a starting point for larger screening pipelines and extend it over time with multi-year stability checks, additional metrics, caching, or deeper drill-down views for shortlisted companies.</p>]]></content:encoded></item><item><title><![CDATA[Building an Open-Source Microservice for Financial Data Retrieval with Financial Modelling Prep]]></title><description><![CDATA[Company-Fundamental Tracking Microservice that is Suitable For Your Requirements.]]></description><link>https://www.nb-data.com/p/building-an-open-source-microservice</link><guid isPermaLink="false">https://www.nb-data.com/p/building-an-open-source-microservice</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sat, 06 Dec 2025 05:33:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JmK7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JmK7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JmK7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg 424w, https://substackcdn.com/image/fetch/$s_!JmK7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg 848w, https://substackcdn.com/image/fetch/$s_!JmK7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!JmK7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JmK7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg" width="1400" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!JmK7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg 424w, https://substackcdn.com/image/fetch/$s_!JmK7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg 848w, https://substackcdn.com/image/fetch/$s_!JmK7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!JmK7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09801c0b-a10c-4b60-bbf4-c0ecb167f940_1400x788.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@growtika?utm_source=medium&amp;utm_medium=referral">Growtika</a> on <a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></div><p>Financial data is one of the datasets that most companies and individuals need. It is sought for because it is helpful in many projects, such as building investment dashboards and portfolio trackers, running valuation and scenario analysis for listed companies, or training machine learning models for financial use cases. In all of these cases, the hard part is rarely &#8220;getting the data once.&#8221; The hard part is accessing the data cleanly and consistently every time you start a new project.</p><p><a href="https://site.financialmodelingprep.com/developer/docs">Financial Modeling Prep&#8217;</a>s stable API provides a rich set of endpoints for financial fundamentals: income statements, balance sheets, cash flow statements, profiles, and more. It solves the problem of data source availability.</p><p>But there is still a hassle for developers: the APIs are relatively low-level. You have to remember the exact endpoint names, pass the proper query parameters, manage API keys in every script, and repeatedly transform the raw JSON into the handful of fields you actually need for your analysis.</p><p>This is where a small microservice comes in handy. Instead of remembering every FMP&#8217;s URLs and parameters, centralize that logic in one place and provide a few task-specific endpoints like &#8220;search companies,&#8221; &#8220;get snapshot,&#8221; and &#8220;get history.&#8221; This approach allows us to easily manage the data flow and even customize the overall data structure output.</p><p>In this article, we will build a minimal financial microservice on top of Financial Modeling Prep&#8217;s stable API. It will not replace a complete analytics platform; instead, it will provide a focused set of endpoints for any follow-up analytical process.</p><p>Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2><strong>Foundation</strong></h2><blockquote><p><em>You can access the entire code used in this tutorial in this <a href="https://github.com/CornelliusYW/fmp_microservice_financial">repository</a>.</em></p></blockquote><p>Before we move on to the technical part, we need to understand that building a microservice on top of existing APIs offers several practical benefits.</p><ul><li><p>First, you reduce duplication. Transforming and cleaning the responses from FMP is implemented once, tested once, and shared across everything you build.</p></li><li><p>Second, you gain a single point for the overall information. Configuration of API keys, error handling, rate limiting, and caching can all live in the microservice rather than being reimplemented ad hoc.</p></li><li><p>Third, you create a more approachable entry point for others on your team. For example, they can request <code>/companies/AAPL/snapshot</code> without needing to read the FMP documentation first.</p></li></ul><p>These are a few benefits you have, primarily when you work as a developer and data scientist, that need consistency across all companies.</p><h3><strong>The Data Source</strong></h3><p>Let&#8217;s start building our financial microservice. We will begin by deciding which data from FMP we will use. For this project, all the data comes from Financial Modeling Prep&#8217;s stable API, where we will work with a single base URL and a consistent naming pattern using the following:</p><pre><code>https://financialmodelingprep.com/stable</code></pre><p>Every function is expressed as a specific endpoint on this base, with parameters passed as query string parameters.</p><p>In this microservice, we only use a small subset of what FMP offers, focusing on the core fundamentals most people need. To keep things simple, the service relies on five primary endpoints:</p><ul><li><p><strong>Company search </strong>(<code>search-symbol</code>): Let&#8217;s you search by a company name or a partial ticker and returns candidates with symbols, names, exchanges, and currencies.</p></li><li><p><strong>Company profile </strong>(<code>profile</code>): Returns basic information such as company name, exchange, currency, and other metadata.</p></li><li><p><strong>Income statement </strong>(<code>income-statement</code>): Provides revenue, net income, and other income-statement fields over time.</p></li><li><p><strong>Balance sheet statement </strong>(<code>balance-sheet-statement</code>): Provides total assets, total liabilities, and other balance sheet fields.</p></li><li><p><strong>Cash flow statement </strong>(<code>cash-flow-statement</code>): Provides operating cash flow and other cash flow items.</p></li></ul><p>Each of these endpoints will support parameters like:</p><ul><li><p><code>symbol</code> which is the ticker (e.g. <code>AAPL</code>),</p></li><li><p><code>period</code> like <code>annual</code> or <code>quarterly</code>,</p></li><li><p><code>limit</code> which is the number of records you want (e.g., the last 5 years).</p></li></ul><p>These data are enough to reconstruct a basic picture of a company&#8217;s fundamentals.</p><h3><strong>What the Financial Microservice does</strong></h3><p>In this project, we will develop a consistent REST API for the microservice:</p><ul><li><p><code>GET /health</code>: basic health check.</p></li><li><p><code>GET /companies/search?q=...</code>: search companies by name/symbol.</p></li><li><p><code>GET /companies/{symbol}/snapshot</code>: latest fundamentals snapshot (revenue, net income, assets, liabilities, operating cash flow, plus basic profile).</p></li><li><p><code>GET /companies/{symbol}/history?years=N</code>: simple time series of revenue and net income for the last N annual periods.</p></li></ul><p>These endpoints will abstract the FMP URL details, the API key management, and the raw JSON shape. The endpoint itself is the minimum version, so it does not cover any complex authentication, database management, or advanced applications.</p><h3><strong>Project architecture</strong></h3><p>For the project architecture, we will follow the structure below:</p><pre><code>fmp_microservice_financial/
&#9500;&#9472; app/
&#9474;  &#9500;&#9472; __init__.py
&#9474;  &#9500;&#9472; main.py          # FastAPI app + routes
&#9474;  &#9500;&#9472; fmp_client.py    # Wrapper around FMP stable API
&#9474;  &#9492;&#9472; schemas.py       # Pydantic models for responses
&#9500;&#9472; requirements.txt
&#9500;&#9472; .env.example
&#9492;&#9472; Dockerfile</code></pre><p>At the high level, the microservice will have the flow like below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ITRy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ITRy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png 424w, https://substackcdn.com/image/fetch/$s_!ITRy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png 848w, https://substackcdn.com/image/fetch/$s_!ITRy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png 1272w, https://substackcdn.com/image/fetch/$s_!ITRy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ITRy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png" width="1456" height="566" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:566,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ITRy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png 424w, https://substackcdn.com/image/fetch/$s_!ITRy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png 848w, https://substackcdn.com/image/fetch/$s_!ITRy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png 1272w, https://substackcdn.com/image/fetch/$s_!ITRy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faca3a951-be7c-44b9-af4d-fec9002f7b21_1933x752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Financial Microservice Financial high-level</figcaption></figure></div><div><hr></div><h2><strong>Building Financial Microservice</strong></h2><p>Let&#8217;s start by filling up the <code>requirements.txt</code> A file that will contain all the necessary Python libraries we will use to build the financial microservice.</p><pre><code>fastapi
uvicorn
requests
python-dotenv
pydantic</code></pre><p>Based on the requirements, we will use FastAPI to build our endpoint and Pydantic to define the JSON output schema.</p><p>Next, we will set up the <code>.env</code> file to accommodate all the environmental variables used in this project. One requirement is the FMP Free API key, which you can obtain in the <a href="https://site.financialmodelingprep.com/developer/docs/dashboard">FMP dashboard</a>. Once you have the API key, we fill the file using the following information:</p><pre><code>FMP_API_KEY=FMP_API_KEY
FMP_BASE_URL=https://financialmodelingprep.com/stable</code></pre><p>With the configuration done, we will set up the microservice application.</p><h3><strong>Building the FMP Client</strong></h3><p>We will start with the client to wrap the FMP API. To keep the rest of the microservice clean, we isolate all interactions with Financial Modeling Prep in a single class called <code>FMPClient</code>. This class knows how to read configuration, build URLs, attach the API key, and handle errors. Everything else in the codebase just calls methods like <code>get_income_statement(&#8221;AAPL&#8221;)</code> without worrying about the complex details.</p><p>Access the <code>fmp_client.py</code> file and fill them with the following code:</p><pre><code>import os
from typing import Any, Dict, List, Optional
import requests
from dotenv import load_dotenv

load_dotenv()

FMP_API_KEY = os.getenv(&#8221;FMP_API_KEY&#8221;)
FMP_BASE_URL = os.getenv(&#8221;FMP_BASE_URL&#8221;, &#8220;https://financialmodelingprep.com/stable&#8221;)

if not FMP_API_KEY:
    raise RuntimeError(
        &#8220;FMP_API_KEY is not set. Please configure it in your environment or .env file.&#8221;
    )

class FMPClient:
    &#8220;&#8221;&#8220;
    Thin wrapper over Financial Modeling Prep stable endpoints.

    Base: https://financialmodelingprep.com/stable
    Examples:
      - /search-symbol?query=AAPL&amp;apikey=...
      - /income-statement?symbol=AAPL&amp;period=annual&amp;limit=5&amp;apikey=...
    &#8220;&#8221;&#8220;

    def __init__(self, api_key: str = FMP_API_KEY, base_url: str = FMP_BASE_URL) -&gt; None:
        self.api_key = api_key
        self.base_url = base_url.rstrip(&#8221;/&#8221;)

    def _get(self, endpoint: str, params: Optional[Dict[str, Any]] = None) -&gt; Any:
        &#8220;&#8221;&#8220;
        endpoint: e.g. &#8216;search-symbol&#8217;, &#8216;income-statement&#8217;, &#8216;profile&#8217;
        &#8220;&#8221;&#8220;
        if params is None:
            params = {}
        params[&#8221;apikey&#8221;] = self.api_key

        url = f&#8221;{self.base_url}/{endpoint.lstrip(&#8217;/&#8217;)}&#8221;
        resp = requests.get(url, params=params, timeout=10)

        if not resp.ok:
            raise RuntimeError(
                f&#8221;FMP API error: {resp.status_code} {resp.text[:200]}&#8221;
            )
        return resp.json()

    def search_symbol(self, query: str, limit: int = 10) -&gt; List[Dict[str, Any]]:
        &#8220;&#8221;&#8220;
        https://financialmodelingprep.com/stable/search-symbol?query=...&amp;limit=...&amp;exchange=...
        &#8220;&#8221;&#8220;
        return self._get(
            &#8220;search-symbol&#8221;,
            {
                &#8220;query&#8221;: query,
                &#8220;limit&#8221;: limit,
                # you can adjust or drop the exchange filter
                &#8220;exchange&#8221;: &#8220;NASDAQ,NYSE,AMEX&#8221;,
            },
        )

    def get_company_profile(self, symbol: str) -&gt; List[Dict[str, Any]]:
        &#8220;&#8221;&#8220;
        https://financialmodelingprep.com/stable/profile?symbol=AAPL
        &#8220;&#8221;&#8220;
        return self._get(
            &#8220;profile&#8221;,
            {&#8221;symbol&#8221;: symbol.upper()},
        )

    def get_income_statement(
        self,
        symbol: str,
        period: str = &#8220;annual&#8221;,
        limit: int = 5,
    ) -&gt; List[Dict[str, Any]]:
        &#8220;&#8221;&#8220;
        https://financialmodelingprep.com/stable/income-statement?symbol=AAPL&amp;period=annual&amp;limit=5
        &#8220;&#8221;&#8220;
        return self._get(
            &#8220;income-statement&#8221;,
            {
                &#8220;symbol&#8221;: symbol.upper(),
                &#8220;period&#8221;: period,
                &#8220;limit&#8221;: limit,
            },
        )

    def get_balance_sheet(
        self,
        symbol: str,
        period: str = &#8220;annual&#8221;,
        limit: int = 5,
    ) -&gt; List[Dict[str, Any]]:
        &#8220;&#8221;&#8220;
        https://financialmodelingprep.com/stable/balance-sheet-statement?symbol=AAPL&amp;period=annual&amp;limit=5
        &#8220;&#8221;&#8220;
        return self._get(
            &#8220;balance-sheet-statement&#8221;,
            {
                &#8220;symbol&#8221;: symbol.upper(),
                &#8220;period&#8221;: period,
                &#8220;limit&#8221;: limit,
            },
        )

    def get_cash_flow(
        self,
        symbol: str,
        period: str = &#8220;annual&#8221;,
        limit: int = 5,
    ) -&gt; List[Dict[str, Any]]:
        &#8220;&#8221;&#8220;
        https://financialmodelingprep.com/stable/cash-flow-statement?symbol=AAPL&amp;period=annual&amp;limit=5
        &#8220;&#8221;&#8220;
        return self._get(
            &#8220;cash-flow-statement&#8221;,
            {
                &#8220;symbol&#8221;: symbol.upper(),
                &#8220;period&#8221;: period,
                &#8220;limit&#8221;: limit,
            },
        )</code></pre><p>Let&#8217;s break down what happens in the code above. The first few lines are just to set up imports and load the configuration, where we specify the base URL to use for all API calls and the API key to attach.</p><p>Next, we define the <code>FMPClient</code> class as a thin wrapper that encapsulates how to call FMP. The <code>api_key</code> and <code>base_url</code> are initialized from the module-level variables, but can be overridden when instantiating the class. Also, <code>base_url.rstrip(&#8221;/&#8221;)</code> ensures there is no trailing slash on the base URL. This makes it easier to concatenate safely <code>base_url</code> and endpoint names without accidentally creating double slashes.</p><p>Then, we define the shared helper utility <code>_get</code> function, which will be used by the other functions within the <code>FMPClient</code> class.</p><pre><code>def _get(self, endpoint: str, params: Optional[Dict[str, Any]] = None) -&gt; Any:</code></pre><p>The function will accept the endpoint name we set, such as <code>&#8220;search-symbol&#8221;</code> or <code>&#8220;income-statement&#8221;</code>. It will also take an optional <code>params</code> dictionary and ensure one crucial parameter is always present, which is the<code>apikey</code>. The main activity of the function will construct the valid URL and send a GET request using<code>requests.get</code>that returns<code>resp.json()</code>the parsed JSON body from FMP.</p><p>The rest of the class defines small, descriptive methods for specific FMP endpoints. For example the <code>&#8220;search-symbol&#8221;</code>:</p><pre><code>def search_symbol(self, query: str, limit: int = 10) -&gt; List[Dict[str, Any]]:</code></pre><p>For the function, we could pass parameters such as the free-text <code>query</code> and an optional <code>limit</code>. The function will call <code>_get</code> with the endpoint name <code>&#8220;search-symbol&#8221;</code> and a parameters dictionary.</p><p>From the rest of your code, you can write:</p><pre><code>client.search_symbol(&#8221;AAPL&#8221;)</code></pre><p>And get back a list of candidate companies without worrying about URLs or query string details.</p><p>This client will centralize our configuration and error handling and provide the high-level vocabulary for our microservice.</p><h3><strong>Building the Microservice Schema</strong></h3><p>To keep the output consistent, the microservice does not expose raw JSON from FMP directly. Instead, we define a small set of Pydantic models that precisely describe the fields clients can expect from each endpoint, independent of how FMP structures its responses. This is where we will define them at the<code>schemas.py</code> with the following code:</p><pre><code>from typing import List, Optional
from pydantic import BaseModel, Field

class CompanySearchItem(BaseModel):
    symbol: str
    name: str
    exchange: Optional[str] = None
    currency: Optional[str] = None

class CompanySearchResponse(BaseModel):
    results: List[CompanySearchItem]

class IncomeSnapshot(BaseModel):
    revenue: Optional[float] = Field(
        None, description=&#8221;Total revenue for the period&#8221;
    )
    netIncome: Optional[float] = Field(
        None, description=&#8221;Net income for the period&#8221;
    )

class BalanceSheetSnapshot(BaseModel):
    totalAssets: Optional[float] = None
    totalLiabilities: Optional[float] = None

class CashFlowSnapshot(BaseModel):
    operatingCashFlow: Optional[float] = None

class CompanySnapshot(BaseModel):
    symbol: str
    name: Optional[str] = None
    currency: Optional[str] = None
    exchange: Optional[str] = None
    asOf: Optional[str] = Field(
        None, description=&#8221;Financial statement date&#8221;
    )

    income: IncomeSnapshot
    balanceSheet: BalanceSheetSnapshot
    cashFlow: CashFlowSnapshot

class HistoryPoint(BaseModel):
    date: str
    revenue: Optional[float] = None
    netIncome: Optional[float] = None

class CompanyHistoryResponse(BaseModel):
    symbol: str
    points: List[HistoryPoint]</code></pre><p>These Pydantic schema models help define our microservice public interface, even when FMP&#8217;s response changes, create API self-documentation (with Swagger UI), and keep our microservices focused as we decide the output structure.</p><p>You can also change the schema above as needed. What is important is that you understand the FMP outputs and understand the result you want. These schema models will be used together with the client we set up previously in the application, which we set up at the <code>main.py</code>.</p><h3><strong>Building the Microservice Application</strong></h3><p>The <code>main.py</code> file is where the microservice becomes a real API that we can call elsewhere. We can define them as follows:</p><pre><code>from typing import List
from fastapi import Depends, FastAPI, HTTPException, Query
from fastapi.responses import JSONResponse
from app.fmp_client import FMPClient
from app.schemas import (
    CompanySearchItem,
    CompanySearchResponse,
    CompanySnapshot,
    IncomeSnapshot,
    BalanceSheetSnapshot,
    CashFlowSnapshot,
    HistoryPoint,
    CompanyHistoryResponse,
)

app = FastAPI(
    title=&#8221;Company Fundamentals Microservice&#8221;,
    version=&#8221;0.1.0&#8221;,
    description=(
        &#8220;Minimal open-source service that wraps Financial Modeling Prep &#8220;
        &#8220;stable fundamentals endpoints.&#8221;
    ),
)

def get_client() -&gt; FMPClient:
    return FMPClient()

@app.get(&#8221;/health&#8221;)
def health_check() -&gt; dict:
    return {&#8221;status&#8221;: &#8220;ok&#8221;}

@app.get(
    &#8220;/companies/search&#8221;,
    response_model=CompanySearchResponse,
    summary=&#8221;Search for companies by name or symbol&#8221;,
)
def search_companies(
    q: str = Query(..., min_length=1, description=&#8221;Search query&#8221;),
    limit: int = Query(10, ge=1, le=50),
    client: FMPClient = Depends(get_client),
):
    raw = client.search_symbol(q, limit=limit)
    results: List[CompanySearchItem] = []

    for item in raw:
        results.append(
            CompanySearchItem(
                symbol=item.get(&#8221;symbol&#8221;),
                name=item.get(&#8221;name&#8221;) or item.get(&#8221;companyName&#8221;),
                exchange=item.get(&#8221;stockExchange&#8221;),
                currency=item.get(&#8221;currency&#8221;),
            )
        )

    return CompanySearchResponse(results=results)

@app.get(
    &#8220;/companies/{symbol}/snapshot&#8221;,
    response_model=CompanySnapshot,
    summary=&#8221;Latest fundamentals snapshot for a given company&#8221;,
)
def company_snapshot(
    symbol: str,
    client: FMPClient = Depends(get_client),
):
    profiles = client.get_company_profile(symbol)
    if not profiles:
        raise HTTPException(status_code=404, detail=&#8221;Company profile not found&#8221;)

    profile = profiles[0]
    name = profile.get(&#8221;companyName&#8221;) or profile.get(&#8221;name&#8221;)
    currency = profile.get(&#8221;currency&#8221;)
    exchange = profile.get(&#8221;exchangeShortName&#8221;) or profile.get(&#8221;exchange&#8221;)

    income_list = client.get_income_statement(symbol, period=&#8221;annual&#8221;, limit=1)
    balance_list = client.get_balance_sheet(symbol, period=&#8221;annual&#8221;, limit=1)
    cashflow_list = client.get_cash_flow(symbol, period=&#8221;annual&#8221;, limit=1)

    income_raw = income_list[0] if income_list else {}
    balance_raw = balance_list[0] if balance_list else {}
    cashflow_raw = cashflow_list[0] if cashflow_list else {}

    as_of = (
        income_raw.get(&#8221;date&#8221;)
        or balance_raw.get(&#8221;date&#8221;)
        or cashflow_raw.get(&#8221;date&#8221;)
    )

    income = IncomeSnapshot(
        revenue=income_raw.get(&#8221;revenue&#8221;) or income_raw.get(&#8221;revenueTTM&#8221;),
        netIncome=income_raw.get(&#8221;netIncome&#8221;) or income_raw.get(&#8221;netIncomeTTM&#8221;),
    )

    balance = BalanceSheetSnapshot(
        totalAssets=balance_raw.get(&#8221;totalAssets&#8221;),
        totalLiabilities=balance_raw.get(&#8221;totalLiabilities&#8221;),
    )

    cashflow = CashFlowSnapshot(
        operatingCashFlow=cashflow_raw.get(&#8221;operatingCashFlow&#8221;)
        or cashflow_raw.get(&#8221;operatingCashFlowTTM&#8221;)
    )

    snapshot = CompanySnapshot(
        symbol=str(symbol).upper(),
        name=name,
        currency=currency,
        exchange=exchange,
        asOf=as_of,
        income=income,
        balanceSheet=balance,
        cashFlow=cashflow,
    )

    return snapshot

@app.get(
    &#8220;/companies/{symbol}/history&#8221;,
    response_model=CompanyHistoryResponse,
    summary=&#8221;Simple revenue/net income history for charting&#8221;,
)
def company_history(
    symbol: str,
    years: int = Query(5, ge=1, le=20),
    client: FMPClient = Depends(get_client),
):
    income_list = client.get_income_statement(
        symbol, period=&#8221;annual&#8221;, limit=years
    )

    if not income_list:
        raise HTTPException(status_code=404, detail=&#8221;No income statement data found&#8221;)

    points: List[HistoryPoint] = []
    for row in income_list:
        points.append(
            HistoryPoint(
                date=row.get(&#8221;date&#8221;),
                revenue=row.get(&#8221;revenue&#8221;),
                netIncome=row.get(&#8221;netIncome&#8221;),
            )
        )

    return CompanyHistoryResponse(symbol=str(symbol).upper(), points=points)

@app.exception_handler(RuntimeError)
def runtime_error_handler(request, exc: RuntimeError):
    return JSONResponse(
        status_code=502,
        content={&#8221;detail&#8221;: str(exc)},
    )</code></pre><p>Let&#8217;s break down what happens in the code above.</p><p>First, we initiate the FastAPI application with metadata, including <code>title</code>, <code>version</code>, and <code>description</code> which will be used in the auto-generated Swagger UI at <code>/docs</code>.</p><p>Next, we inject the FMP client into the <code>get_client</code> function that tells FastAPI how to create an <code>FMPClient</code> when an endpoint needs one.</p><pre><code>def get_client() -&gt; FMPClient:
    return FMPClient()</code></pre><p>Later, in each route, you will see:</p><pre><code>client: FMPClient = Depends(get_client)</code></pre><p>This makes it easier to construct the client, and it becomes easier to swap in a mock client for testing.</p><p>With the application created, we will set up the endpoint route. Each endpoint will have different information we could acquire. For example, the <code>/companies/{symbol}/snapshot</code> route will return the company&#8217;s fundamental information:</p><pre><code>@app.get(
    &#8220;/companies/{symbol}/snapshot&#8221;,
    response_model=CompanySnapshot,
    summary=&#8221;Latest fundamentals snapshot for a given company&#8221;,
)
def company_snapshot(
    symbol: str,
    client: FMPClient = Depends(get_client),
):</code></pre><p>The endpoint will basically perform five steps, including:</p><ol><li><p><strong>Fetch basic profile</strong></p></li><li><p><strong>Fetch the latest financial statements</strong></p></li><li><p><strong>Determine the &#8220;as of&#8221; date</strong></p></li><li><p><strong>Build the snapshot components</strong></p></li><li><p><strong>Assemble the </strong><code>CompanySnapshot</code></p></li></ol><p>The endpoint returns this <code>CompanySnapshot</code>. FastAPI serializes it to JSON and automatically documents it.</p><h3><strong>Running the Microservice</strong></h3><p>With the application in place, let&#8217;s test the microservice. We can do that by running the following command in the CLI:</p><pre><code>uvicorn app.main:app --reload</code></pre><p>If it&#8217;s run correctly, you should see the information like below in your CLI:</p><pre><code>INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [9084] using WatchFiles
INFO:     Started server process [27492]
INFO:     Waiting for application startup.
INFO:     Application startup complete.</code></pre><p>Let&#8217;s check the microservice we just created. As we have been setting up the documentation along the way, we could access them using the following URI in your browser:</p><pre><code>http://localhost:8000/docs</code></pre><p>Access the URI above, and you will see our microservice documentation below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8lQY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8lQY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png 424w, https://substackcdn.com/image/fetch/$s_!8lQY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png 848w, https://substackcdn.com/image/fetch/$s_!8lQY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png 1272w, https://substackcdn.com/image/fetch/$s_!8lQY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8lQY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png" width="1400" height="945" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:945,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!8lQY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png 424w, https://substackcdn.com/image/fetch/$s_!8lQY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png 848w, https://substackcdn.com/image/fetch/$s_!8lQY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png 1272w, https://substackcdn.com/image/fetch/$s_!8lQY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16fdb594-d7c3-41e8-bd81-19af90df3e51_1400x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Try to check out one of the endpoints, for example, the <code>/health</code> endpoint:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LN36!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LN36!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png 424w, https://substackcdn.com/image/fetch/$s_!LN36!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png 848w, https://substackcdn.com/image/fetch/$s_!LN36!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png 1272w, https://substackcdn.com/image/fetch/$s_!LN36!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LN36!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png" width="1400" height="725" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:725,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!LN36!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png 424w, https://substackcdn.com/image/fetch/$s_!LN36!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png 848w, https://substackcdn.com/image/fetch/$s_!LN36!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png 1272w, https://substackcdn.com/image/fetch/$s_!LN36!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8876c7d4-95ee-4741-9891-02ba7f7822c7_1400x725.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can see that the endpoint executes correctly and returns the expected response.</p><p>Let&#8217;s try out the other endpoint, such as <code>/companies/{symbol}/snapshot</code> to acquire the company&#8217;s financial fundamentals:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c3nr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c3nr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png 424w, https://substackcdn.com/image/fetch/$s_!c3nr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png 848w, https://substackcdn.com/image/fetch/$s_!c3nr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png 1272w, https://substackcdn.com/image/fetch/$s_!c3nr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c3nr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png" width="1400" height="934" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:934,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!c3nr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png 424w, https://substackcdn.com/image/fetch/$s_!c3nr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png 848w, https://substackcdn.com/image/fetch/$s_!c3nr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png 1272w, https://substackcdn.com/image/fetch/$s_!c3nr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6ab4a3c-9f18-4afb-ae76-338a0a44a0b1_1400x934.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>From the image above, we can see that the microservice successfully accesses multiple FMP endpoints and provides the concise output necessary for our work.</p><h3><strong>Microservice Containerization</strong></h3><p>Lastly, we will containerize our microservice. So far, we have a working microservice that runs locally. That&#8217;s fine for development, but as soon as you want to share the service with someone else or deploy it somewhere other than your laptop, we will run into dependency issues.</p><p>Containerizing the service with Docker provides a self-contained, reproducible environment that anyone with Docker can run, regardless of their local setup.</p><p>To perform Docker containerization, you need to install <a href="https://www.docker.com/products/docker-desktop/">Docker Desktop</a> initially. Then, fill the <code>Dockerfile</code> file with the following code:</p><pre><code>ROM python:3.11-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /code

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY app ./app

EXPOSE 8000

# Run the FastAPI app with uvicorn
CMD [&#8221;uvicorn&#8221;, &#8220;app.main:app&#8221;, &#8220;--host&#8221;, &#8220;0.0.0.0&#8221;, &#8220;--port&#8221;, &#8220;8000&#8221;]</code></pre><p>Next, we will build the Docker image with the following command:</p><pre><code>docker build -t microservice-financial-service .</code></pre><p>The build above will result in the reusable image we can use and share with others. Assuming your <code>.env</code> have appropriately filled, we can run the container with the following command:</p><pre><code>docker run --env-file .env -p 8000:8000 microservice-financial-service</code></pre><p>Then, visit the <code>http://localhost:8000/docs</code> once more to access the Microservice documentation.</p><p>With the microservice running in the container, we can test it out in the Jupyter Notebook with the following code:</p><pre><code>import requests

BASE_URL = &#8220;http://127.0.0.1:8000&#8221;
symbol = &#8220;AAPL&#8221;

response = requests.get(f&#8221;{BASE_URL}/companies/{symbol}/snapshot&#8221;)
print(&#8221;Status:&#8221;, response.status_code)
snapshot = response.json()
snapshot</code></pre><p>The output result looks like this:</p><pre><code>Status: 200
{&#8217;symbol&#8217;: &#8216;AAPL&#8217;,
 &#8216;name&#8217;: &#8216;Apple Inc.&#8217;,
 &#8216;currency&#8217;: &#8216;USD&#8217;,
 &#8216;exchange&#8217;: &#8216;NASDAQ&#8217;,
 &#8216;asOf&#8217;: &#8216;2025-09-27&#8217;,
 &#8216;income&#8217;: {&#8217;revenue&#8217;: 416161000000.0, &#8216;netIncome&#8217;: 112010000000.0},
 &#8216;balanceSheet&#8217;: {&#8217;totalAssets&#8217;: 359241000000.0,
  &#8216;totalLiabilities&#8217;: 285508000000.0},
 &#8216;cashFlow&#8217;: {&#8217;operatingCashFlow&#8217;: 111482000000.0}}</code></pre><p>Overall, our microservice financial with FMP works well and is ready to use for any follow-up actions.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/building-an-open-source-microservice?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/building-an-open-source-microservice?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h2><strong>Conclusion</strong></h2><p>In this article, we have turned Financial Modelling Prep&#8217;s stable API into a small and reusable microservice that better meets our company&#8217;s needs than the raw endpoints.</p><p>By wrapping core functions such as search, snapshot, and history in FastAPI, Pydantic schemas, and a lightweight Docker image, we now have a straightforward, well-defined interface for our data acquisition.</p><p>You can use this as a drop-in data layer for notebooks, dashboards, or internal tools, and expand it over time with new endpoints, caching, or authentication as your use cases develop.</p>]]></content:encoded></item><item><title><![CDATA[Introduction to Open‑Source Image Generation Models: A Beginner’s Guide]]></title><description><![CDATA[Gentle introduction to understand the image generation AI]]></description><link>https://www.nb-data.com/p/introduction-to-opensource-image</link><guid isPermaLink="false">https://www.nb-data.com/p/introduction-to-opensource-image</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Sun, 09 Nov 2025 12:47:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UgXB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UgXB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UgXB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UgXB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UgXB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UgXB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UgXB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg" width="1312" height="736" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:736,&quot;width&quot;:1312,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98353,&quot;alt&quot;:&quot;Introduction to Open&#8209;Source Image Generation Models: A Beginner&#8217;s Guide&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/178345084?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Introduction to Open&#8209;Source Image Generation Models: A Beginner&#8217;s Guide" title="Introduction to Open&#8209;Source Image Generation Models: A Beginner&#8217;s Guide" srcset="https://substackcdn.com/image/fetch/$s_!UgXB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UgXB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UgXB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UgXB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd41827a0-51fb-4dd2-8143-6415a37c4ce3_1312x736.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by Author | Ideogram.ai</figcaption></figure></div><h1>Introduction</h1><p>Open&#8209;source image generation models are AI tools that create pictures based on text descriptions, and they are freely available for anyone to use or modify. In simple terms, you can type in a prompt (for example, &#8220;a medieval knight on a horse at sunset&#8221;), and the model will generate an image matching that description. </p><p>These models rose to prominence around 2022 when AI image generators went mainstream.  First with OpenAI&#8217;s proprietary DALL&#8209;E 2, and soon after with the open-source <a href="https://stability.ai/stable-image">Stable Diffusion model</a> released by Stability AI. </p><p>Unlike closed systems (such as Midjourney or DALL&#8209;E, which you can only access via paid services or APIs), open-source models have no paywalls or strict usage rules, allowing anyone to run them locally or in the cloud without the typical costs or restrictions of proprietary software. </p><p>In this article, we will explore Open&#8209;Source Image Generation Models further and how you can navigate them. </p><p>Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>Key Advantages</h1><p>Open-source image generation models are powerful AI art tools that put creative control directly in the users&#8217; hands, free of charge and open for customization by the community. </p><p>There are many advantages to using the open-source image generation models, including:</p><ul><li><p><strong>Cost Efficiency:</strong> These models are available without licensing fees or subscription costs. You can run them on your own hardware or affordable cloud instances, avoiding the pay-per-image charges of some commercial services. In short, aside from hardware or electricity, generating images with an open model is practically free.</p></li><li><p><strong>Flexibility &amp; Customization:</strong> Since the code and weights are open, you have the freedom to customize the model to suit your needs. You can adjust parameters, change the model&#8217;s code, or even fine-tune it on your own images to create a specific style. This allows developers or artists to build the tool according to their vision rather than being limited to a generic service. For example, developers have made custom versions of Stable Diffusion for medical imaging, anime art, interior design, and more &#8211; all made possible by the flexible open license.</p></li><li><p><strong>Transparency (Trust &amp; Understanding):</strong> Open-source models enable anyone to see how they work internally. The model&#8217;s architecture and training data can be scrutinized for biases or problems, which helps build trust. There&#8217;s no hidden "secret sauce" behind closed doors, as researchers and users can review the model&#8217;s behavior and make sure it isn&#8217;t doing anything harmful. This openness also encourages learning; students and engineers can study actual, cutting-edge model code to improve their understanding of AI.</p></li><li><p><strong>Community-Driven Innovation:</strong> A vibrant community surrounds these models, leading to rapid updates and contributions worldwide. Developers share features, improvements, and fixes, allowing open models to advance faster than proprietary ones. For example, the Stable Diffusion community has developed a broad ecosystem of plugins, enhancements, and fine-tuned checkpoints. Many community-trained versions are available online for various aesthetics or tasks. This collaborative environment means that if you face a problem or seek a new feature, a solution is likely already available or in progress.</p></li><li><p><strong>No Hard Usage Limits:</strong> Unlike some proprietary tools that may limit the number of images you can generate or impose content restrictions, open-source tools allow you to generate as many as your hardware can support. There&#8217;s no rate limiting or mandatory censorship built into the model itself.</p></li><li><p><strong>Educational Value:</strong> Open models are a great resource for education and research. Students, researchers, or anyone interested can experiment with them to learn about AI image creation. Since everything is accessible, you can observe how modifying the code or training data influences the results, which is very helpful for understanding machine learning. This open access speeds up progress in both academia and industry in generative AI.</p></li></ul><p>These are the benefits you can expect from using the open-source image generation model. However, there are still challenges that come with using these models.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/introduction-to-opensource-image?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/introduction-to-opensource-image?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h2>Disadvantages and Challenges</h2><p>Despite their many benefits, open-source image generation models also present some challenges and drawbacks that users should consider:</p><ul><li><p><strong>High Hardware Requirements:</strong> Running advanced image models requires a powerful computer, ideally a modern GPU with ample VRAM. Generating high-resolution or multiple images can be resource-intensive, making it difficult for basic laptops or phones to run models like Stable Diffusion locally. Users may need hardware upgrades or cloud services for good performance. (For example, generating a 512&#215;512 image typically needs a GPU with 4&#8211;8 GB VRAM and can take several seconds.)</p></li><li><p><strong>Technical Complexity:</strong> The open-source community aims to make these tools user-friendly, but they aren&#8217;t always plug-and-play. Setting up and running a model might involve working with Python environments, drivers, and command-line interfaces, which can intimidate beginners. <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui">The popular UI</a> has&nbsp;many&nbsp;features, which can overwhelm new users. Using open models fully often requires technical knowledge, and troubleshooting issues like installation errors or GPU incompatibilities is part of learning. Advanced features like training custom models or chaining multiple models need even more expertise.</p></li><li><p><strong>Quality Limitations and Trade-offs:</strong> Open models can produce impressive results but aren't perfect, sometimes generating artifacts or errors like distorted hands or text. Outputs vary, as you may need to adjust prompts or settings. While proprietary models like MidJourney are optimized for specific styles, open models may require extra tuning. Sometimes the images look great but lack logical consistency, as models mimic patterns without understanding scenes. Expect trial and error for the desired quality.</p></li><li><p><strong>Ethical Concerns (Bias and Misuse):</strong> Open models learn from large datasets that can contain biases, leading to skewed representations, especially if certain demographics are overrepresented. They lack filters to prevent harmful content, raising ethical concerns about misuse, such as generating violent or misleading images. While open-source freedom enables innovation, it also allows malicious use, creating a double-edged sword.</p></li><li><p><strong>Legal and Copyright Questions:</strong> There are debates about the legality of images from these models, as their training data often includes copyrighted images scraped from the web without permission. This raises lawsuits and uncertainty over infringement when outputs mimic styles or images closely. Commercial use of AI art might&nbsp;face legal issues&nbsp;until laws are updated. Unlike proprietary services that ban generating images of real people or copyrighted characters, open models can do whatever is asked, risking legal trouble if used improperly. It&#8217;s important to stay informed about legal changes and use the technology ethically.</p></li></ul><p>These are the challenges and disadvantages we can encounter if we are using the open-source image generation model.</p><div><hr></div><h1>How Does an Open-Source Image Generation Model Work?</h1><p>Under the hood, most modern open-source image generators use a process called diffusion to create images. In simple terms, the model starts with a field of random noise and gradually refines it into a coherent picture that matches your prompt.</p><p>Diffusion models are a type of AI algorithm within the category of generative models, created to generate new data from existing data. Specifically, in diffusion models, this allows the creation of new images based on the input given.</p><p>For diffusion models, the process differs from traditional methods, as it involves adding and then removing noise from the data. Essentially, the model modifies the images and refines them to generate the final output. Think of it as a denoising process where the model learns to remove noise from images.</p><p>The diffusion model was originally introduced in the paper&nbsp;<em><a href="https://arxiv.org/abs/1503.03585">'Deep Unsupervised</a></em><a href="https://arxiv.org/abs/1503.03585">&nbsp;</a><em><a href="https://arxiv.org/abs/1503.03585">Learning using</a></em><a href="https://arxiv.org/abs/1503.03585">&nbsp;</a><em><a href="https://arxiv.org/abs/1503.03585">Nonequilibrium Thermodynamics'&nbsp;</a></em><a href="https://arxiv.org/abs/1503.03585">by Sohl-Dickstein et al. (2015)</a>. It describes converting data into noise via a controlled forward diffusion process and training a model to reverse this process, reconstructing the data through denoising.</p><p>Building on this foundation, Ho et al. (2020) in their paper&nbsp;<em><a href="https://arxiv.org/abs/2006.11239?">"Denoising Diffusion Probabilistic Models"</a></em>&nbsp;introduce the modern diffusion framework, capable of generating high-quality images and surpassing earlier popular models such as Generative Adversarial Networks (GANs). Typically, the diffusion model involves two essential stages:</p><ol><li><p><strong>Forward (diffusion) process</strong>: Data is progressively corrupted by noise addition until it appears as random static.</p></li><li><p><strong>Reverse (denoising) process</strong>: Involves training a neural network to gradually eliminate noise and learn to reconstruct image data starting from pure randomness.</p></li></ol><p>In practice, these steps are performed <strong>in latent space</strong> using a variational autoencoder (VAE): the model denoises compact latent representations and then decodes them back to pixels. Let&#8217;s now examine the components of the diffusion model more closely to make this concrete.</p><div><hr></div><h3><strong>Forward Process</strong></h3><p>The forward process is the first phase, where the images are systematically degraded by noise until they become random static.</p><p>The forward process is controlled and iterative, which we can summarize in the following steps:</p><ol><li><p><strong>Begin with an image dataset</strong></p></li><li><p><strong>Add a small amount of noise</strong> to the image.</p></li><li><p><strong>Repeat</strong> this process many times, possibly hundreds or thousands of times, each time further corrupting the image.</p></li><li><p>After enough steps, the original image will become just <strong>pure noise</strong>.</p></li></ol><p>The process described above is often represented mathematically as a Markov chain because each noisy version depends only on the one right before it, not on the full sequence of steps.</p><p>Why do we gradually turn the image into noise instead of doing it all at once? Our goal in the forward process is to help the model learn to reverse the corruption step by step. Using gradual steps allows the model to learn how to go from noisy data to clearer data. This method helps the model rebuild the image by learning little by little through the process of adding noise.</p><p>To determine how much noise is added to the step, the concept of the schedule is used. For example, linear schedules gradually introduce noise over time, while cosine schedules add noise more slowly and maintain useful image features for a longer duration.</p><p>That&#8217;s a quick summary of the Forward Process. Let&#8217;s explore the Reverse Process further.</p><div><hr></div><h3><strong>Reverse Process</strong></h3><p>The subsequent step after the forward process involves transforming the model into a generator that learns to convert noise into image data. Through small, iterative adjustments, the model can generate new, previously nonexistent images.</p><p>In general, the <strong>reverse process</strong> is the inverse of the forward process, where:</p><ol><li><p><strong>Begin with pure noise,</strong> which is an entirely random image made up of Gaussian noise.</p></li><li><p><strong>Iteratively remove noise</strong>&nbsp;with a trained model that simulates reversing each forward step. In every iteration, the model receives the current noisy image and its timestep, then predicts how to lower the noise level based on what it learned during training.</p></li><li><p><strong>Gradually,</strong>&nbsp;the image becomes clearer, resulting in usable image data.</p></li></ol><p>This reverse process depends on a well-trained model that can effectively denoise noisy images. Diffusion models typically employ a neural network architecture like a&nbsp;<strong>U-Net</strong>, which functions as an autoencoder with convolutional layers in an encoder&#8211;decoder setup. During training, the model learns to predict the noise added in the forward process. At each step, it also takes the timestep into account, enabling it to modify its predictions according to the noise level.</p><p>The model is usually trained with a loss function like&nbsp;<strong>mean squared error (MSE)</strong>, which measures the difference between predicted and actual noise. By reducing this loss across many examples, the model gradually becomes skilled at reversing the diffusion process.</p><p>Compared to options like Generative Adversarial Networks (GANs), diffusion models provide greater stability and a simpler generative process. The step-by-step denoising method results in more expressive learning, making training more reliable and easier to understand.</p><p>Once the model is fully trained, creating a new image follows the reverse process summarized above.</p><div class="directMessage button" data-attrs="{&quot;userId&quot;:6000855,&quot;userName&quot;:&quot;Cornellius Yudha Wijaya&quot;,&quot;canDm&quot;:null,&quot;dmUpgradeOptions&quot;:null,&quot;isEditorNode&quot;:true}" data-component-name="DirectMessageToDOM"></div><div><hr></div><h3><strong>Text Conditioning</strong></h3><p>In many open-source image generation models, these systems can guide the reverse process using text prompts, which we call text conditioning. By incorporating natural language, we get a matching scene instead of random visuals.</p><p>The system uses a pre-trained text encoder (such as CLIP Text; SDXL variants also utilize OpenCLIP or T5) to convert the prompt into a vector or sequence of embeddings. These embeddings are then fed into the diffusion U-Net through cross-attention, enabling the network to concentrate on relevant words and phrases as it denoises. During each step of the reverse process, the model references both the current noisy sample and the text embeddings, employing cross-attention to align emerging visual features with the prompt&#8217;s semantics.</p><p>Many implementations also use classifier-free guidance (CFG): the network blends unconditional and conditional predictions, with a guidance scale determining how closely the image follows the prompt. In latent-diffusion setups, all conditioning occurs in latent space, and a VAE decoder then converts the final latent back into pixels.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Non-Brand Data&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Non-Brand Data</span></a></p><div><hr></div><h1>Notable Open-Source Text-to-Image Models (2025)</h1><ul><li><p><strong><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5">Stable Diffusion v1.5</a></strong> &#8211; The original Stable Diffusion (by CompVis/StabilityAI) is a latent diffusion text-to-image model capable of generating photorealistic images from text prompts.</p></li><li><p><strong><a href="https://huggingface.co/stabilityai/stable-diffusion-2-1">Stable Diffusion v2.1</a></strong> &#8211; A newer StabilityAI release, SD v2.1, is a refined latent diffusion model (768&#215;768) that also creates and edits images from text. </p></li><li><p><strong><a href="https://huggingface.co/stabilityai/stable-diffusion-3-medium">Stable Diffusion 3 Medium (MMDiT)</a></strong> &#8211; A mid-sized &#8220;Stable Diffusion 3&#8221; model utilizing the new Multimodal Diffusion Transformer (MMDiT) architecture. </p></li><li><p><strong><a href="https://huggingface.co/stabilityai/stable-diffusion-3.5-large">Stable Diffusion 3.5 Large (MMDiT)</a></strong> &#8211; A larger MMDiT version of Stable Diffusion 3, optimized for top quality. SD3.5 Large 'offers improved performance in image quality, typography, complex prompt understanding, and resource efficiency.&#8221; </p></li><li><p><strong><a href="https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0">Stable Diffusion XL 1.0 (base)</a></strong> &#8211; The flagship high-capacity SDXL model. The SDXL 1.0 base model is a latent diffusion model using two large CLIP text encoders (ViT-G and ViT-L) to handle nuanced prompts. </p></li><li><p><strong><a href="https://huggingface.co/ByteDance/SDXL-Lightning">SDXL-Lightning (ByteDance)</a></strong> &#8211; A research model by ByteDance that distills Stable Diffusion XL for speed. SDXL-Lightning &#8220;is a lightning-fast text-to-image generation model&#8221; that can produce 1024px images in only a few diffusion steps. </p></li><li><p><strong><a href="https://huggingface.co/black-forest-labs/FLUX.1-dev">FLUX.1 (Black Forest Labs)</a></strong> &#8211; A modern open-weights rectified-flow transformer (&#8776;12B params) for high-fidelity text-to-image. Strong prompt following and DiT-style efficiency. </p></li><li><p><strong><a href="https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic">Playground v2.5 (Playground AI)</a></strong> &#8211; An SDXL-style latent-diffusion base tuned for aesthetic 1024&#215;1024 results and robust aspect ratios.</p></li><li><p><strong><a href="https://github.com/Tencent-Hunyuan/HunyuanImage-3.0">HunyuanImage-3.0 (Tencent)</a></strong> &#8211; A native multimodal open-weights system whose text-to-image module targets parity with leading closed models; active, fast-moving repo with inference code and weights.</p></li><li><p><strong><a href="https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS">PixArt-&#931; (PixArt-alpha)</a></strong><a href="https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"> </a>&#8211; A Diffusion-Transformer (DiT) base that can generate up to 4K directly in a single sampling pass; an influential open alternative to UNet-based LDMs.</p></li></ul><p>Each of the above models is open-source and still widely used, and is able to improve your work.</p><div><hr></div><p>That&#8217;s all for the simple introduction to the Open&#8209;Source Image Generation Models. If you like the article, don&#8217;t forget to share and comment.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/introduction-to-opensource-image?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/introduction-to-opensource-image?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/introduction-to-opensource-image/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/introduction-to-opensource-image/comments"><span>Leave a comment</span></a></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[14 Portfolio Projects That Demonstrate Real Business Value]]></title><description><![CDATA[Learn from these projects to improve your data career]]></description><link>https://www.nb-data.com/p/14-portfolio-projects-that-demonstrate</link><guid isPermaLink="false">https://www.nb-data.com/p/14-portfolio-projects-that-demonstrate</guid><dc:creator><![CDATA[Cornellius Yudha Wijaya]]></dc:creator><pubDate>Tue, 28 Oct 2025 14:30:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!qMFC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qMFC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qMFC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!qMFC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!qMFC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!qMFC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qMFC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg" width="1312" height="736" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:736,&quot;width&quot;:1312,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95897,&quot;alt&quot;:&quot;14 Portfolio Projects That Demonstrate Real Business Value&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/177370160?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="14 Portfolio Projects That Demonstrate Real Business Value" title="14 Portfolio Projects That Demonstrate Real Business Value" srcset="https://substackcdn.com/image/fetch/$s_!qMFC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg 424w, https://substackcdn.com/image/fetch/$s_!qMFC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg 848w, https://substackcdn.com/image/fetch/$s_!qMFC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!qMFC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F477891ae-e9c7-40fe-b307-95539df62bde_1312x736.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by Author | Ideogram.ai</figcaption></figure></div><p>We live in an era when data has become a commodity that every business wants to use. That&#8217;s why there are many companies willing to pay a lot of money to have the best data scientist.</p><p>With numerous competitions happening, the best way to stand out is by having data science portfolios that address real business problems with measurable results. </p><p>Below are 14 real&#8211;world&#8211;inspired projects you can take inspiration from. Each project shows the strategic problem, the approach, measurable impact, and deployment in production.</p><p>Curious about it? Let&#8217;s get into it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>Sponsored Section</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ETPr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ETPr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png 424w, https://substackcdn.com/image/fetch/$s_!ETPr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png 848w, https://substackcdn.com/image/fetch/$s_!ETPr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png 1272w, https://substackcdn.com/image/fetch/$s_!ETPr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ETPr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png" width="1182" height="588" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:588,&quot;width&quot;:1182,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75091,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nb-data.com/i/177370160?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ETPr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png 424w, https://substackcdn.com/image/fetch/$s_!ETPr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png 848w, https://substackcdn.com/image/fetch/$s_!ETPr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png 1272w, https://substackcdn.com/image/fetch/$s_!ETPr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb63bfb71-fab5-476c-8e49-1ad9eb45fb0e_1182x588.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Subscribe to <a href="http://recommendations.page/non-brand-data?ref_code=3006913751&amp;email={{subscriber.email_address}}">asiabits</a></figcaption></figure></div><p>Your fast-track to Asia&#8217;s hottest trends. asiabits delivers sharp insights on tech, business &amp; culture. What the world talks about tomorrow, you read today.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;http://recommendations.page/non-brand-data?ref_code=3006913751&amp;email={{subscriber.email_address}}&quot;,&quot;text&quot;:&quot;Subscribe to asiabits&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="http://recommendations.page/non-brand-data?ref_code=3006913751&amp;email={{subscriber.email_address}}"><span>Subscribe to asiabits</span></a></p><div><hr></div><h2><strong>1. Netflix Content Recommendation Engine</strong> </h2><ul><li><p><strong>Business context:</strong> Netflix needed to keep subscribers engaged by surfacing relevant shows. Its personalization system tailors each user&#8217;s homepage.</p></li><li><p><strong>Tech/method:</strong> A hybrid recommendation pipeline (collaborative filtering, deep learning, extensive content tagging, and ranking). Netflix tags content into ~76,000 &#8220;micro-genres&#8221; and uses multiple models to match users to content.</p></li><li><p><strong>Metrics/results:</strong> The recommendation engine drives about 75&#8211;80% of all viewing hours. This personalization substantially boosts user engagement and retention.</p></li><li><p><strong>Deployment:</strong> Fully embedded in Netflix&#8217;s streaming platform; served via real-time APIs to power each user&#8217;s homepage.</p></li></ul><div><hr></div><h2><strong>2. Walmart E-commerce Search Optimization</strong> </h2><ul><li><p><strong>Business context:</strong> Walmart&#8217;s online store needed improved search results to boost conversions. Previously, basic keyword matches frequently showed irrelevant items.</p></li><li><p><strong>Tech/method:</strong> Machine learning&#8211;based search ranking: deep learning and NLP models trained on billions of past search queries and user click logs. Contextual embeddings and click-through data refine the search results.</p></li><li><p><strong>Metrics/results:</strong> After revamping with ML, Walmart saw a 20% increase in conversion rate from search traffic. In other words, far more users bought products after a search.</p></li><li><p><strong>Deployment:</strong> Integrated into Walmart&#8217;s e-commerce platform (Walmart Labs), updating in real time as new products and queries are added.</p></li></ul><div><hr></div><h2><strong>3. Demand Forecasting &amp; Inventory Optimization (Sam&#8217;s Club)</strong> </h2><ul><li><p><strong>Business context:</strong> Sam&#8217;s Club (Walmart) needs to forecast product demand across stores and distribution centers to improve inventory, pricing, and promotions. Different teams used to create isolated forecasts.</p></li><li><p><strong>Tech/method:</strong> A cloud-based Centralized Forecasting Service utilizing statistical and ML models (e.g., gradient boosting and recurrent neural networks) on historical sales, promotions, seasonality, and weather data. All departments share a unified forecasting pipeline.</p></li><li><p><strong>Metrics/results:</strong> The unified system enhances forecast accuracy and consistency. More precise forecasts decrease excess stock and stockouts. For example, Walmart reported a 10% reduction in excess inventory, a 15% increase in on-shelf availability, and approximately $1&#8239;billion in holding cost savings over 12 months.</p></li><li><p><strong>Deployment:</strong> Deployed on Google Cloud Platform, it offers automated, real-time forecasts on demand. Teams can trigger forecasts through APIs and dashboards, ensuring all decisions rely on a single source of truth.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/14-portfolio-projects-that-demonstrate?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/14-portfolio-projects-that-demonstrate?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h2><strong>4. Personalized Marketing &amp; Offers (Target Guest ID System)</strong></h2><ul><li><p><strong>Business context:</strong> Target gathers both online and in-store customer data. It needed to transform this data into personalized promotions and emails to boost sales.</p></li><li><p><strong>Tech/method:</strong> Real-time ML models that leverage a customer&#8217;s purchase history, demographics, app usage, and social sentiment. Techniques include ensemble learning for propensity scoring and multi-armed bandits for selecting email content.</p></li><li><p><strong>Metrics/results:</strong> In 2023, Target reported that 50% of its digital sales were driven by ML-powered personalization. Personalized emails and in-app suggestions (e.g., dynamic homepage feeds) significantly boosted conversion rates and basket size.</p></li><li><p><strong>Deployment:</strong> Models operate on Google Cloud and Kubernetes, integrated with the e-commerce front end and email marketing systems. A feature store and retraining pipelines ensure models stay updated with live loyalty and browsing data.</p></li></ul><div><hr></div><h2><strong>5. Supply Chain &amp; Workforce Optimization (Target SCOL Project)</strong> </h2><ul><li><p><strong>Business context:</strong> Beyond marketing, Target used ML in its supply chain, such as predicting local demand spikes (from events or weather) and optimizing restocking and staff schedules.</p></li><li><p><strong>Tech/method:</strong> The Supply Chain Optimization Lab (SCOL) developed regression and time-series models using POS data, store traffic, and external data. It also employs classification to generate demand surge alerts.</p></li><li><p><strong>Metrics/results:</strong> These ML initiatives led to notable efficiency improvements: a 12% decrease in out-of-stock items, 20% fewer overstocks, and an 18% boost in labor cost efficiency (better matching staff levels to customer demand).</p></li><li><p><strong>Deployment:</strong> Models are deployed through Airflow and Kubeflow pipelines on Google Cloud. In production, they provide alerts to store management dashboards and automate ordering systems.</p></li></ul><div><hr></div><h2><strong>6. Gaming Hardware Recommender (Razer)</strong></h2><ul><li><p><strong>Business context:</strong> Razer&#8217;s online store serves 175 million users with a variety of gaming devices. Razer aimed to increase cross-sells and up-sells by recommending compatible products, such as suggesting a gaming mouse based on a user&#8217;s PC setup.</p></li><li><p><strong>Tech/method:</strong> They used AWS&#8217;s Amazon Personalize (an ML recommendation service) for user segmentation and filtering. The solution was trained on user-device configurations and purchase history.</p></li><li><p><strong>Metrics/results:</strong> This system achieved a click-through rate 10&#215; higher than industry benchmarks, generating significant additional revenue through customized accessory recommendations.</p></li><li><p><strong>Deployment:</strong> The model operates on Razer Synapse (their configuration utility). Recommendations are provided both in batch (via email campaigns) and in real-time (on the website), and are continuously retrained as user inventories evolve.</p></li></ul><div class="directMessage button" data-attrs="{&quot;userId&quot;:6000855,&quot;userName&quot;:&quot;Cornellius Yudha Wijaya&quot;,&quot;canDm&quot;:null,&quot;dmUpgradeOptions&quot;:null,&quot;isEditorNode&quot;:true}" data-component-name="DirectMessageToDOM"></div><div><hr></div><h2><strong>7. Event Recommendation Newsletter (Ticketek)</strong></h2><ul><li><p><strong>Business context:</strong> Ticketek, a live-event ticketing platform, had 4 million subscribers but only sent out state-based generic newsletters. They aimed to boost sales of smaller events like concerts and sports by matching customers with relevant shows.</p></li><li><p><strong>Tech/method:</strong> Using Amazon Personalize, Ticketek developed a recommendation engine that factors in a user&#8217;s past purchases, browsing history, and event metadata. Every week, it produces personalized event recommendations.</p></li><li><p><strong>Metrics/results:</strong> After the launch, the purchase rate from their newsletter tripled (up 250%), and tickets sold per newsletter opening increased by 49%. These improvements demonstrate highly targeted recommendations, increased engagement, and more sales.</p></li><li><p><strong>Deployment:</strong> Hosted on AWS, the recommender outputs are integrated into Ticketek&#8217;s email system. Personalized newsletters are automatically generated and sent, and real-time REST APIs provide suggestions on the website as well.</p></li></ul><div><hr></div><h2><strong>8. Sports Media Personalization (Pulselive)</strong> </h2><ul><li><p><strong>Business context:</strong> Pulselive, a digital partner for sports clients, needed to customize video highlights for fans of major football clubs and events. Generic video pages were not performing well.</p></li><li><p><strong>Tech/method:</strong> Again, using Amazon Personalize, they input user clickstream and team preferences into an ML model that ranks live match clips and news items.</p></li><li><p><strong>Metrics/results:</strong> For a leading European football client, personalized recommendations boosted video consumption by 20% across web and mobile platforms. Fans interacted more with content when it was customized to their favorite teams and topics.</p></li><li><p><strong>Deployment:</strong> Deployed on AWS, outputs plug into the Pulselive platform. Content is delivered through a personalized video carousel on the club&#8217;s website and app, with a feedback loop for ongoing learning.</p></li></ul><div><hr></div><h2><strong>9. Fraud Detection at Scale (Mastercard)</strong> </h2><ul><li><p><strong>Business context:</strong> Mastercard handles millions of transactions each minute. They needed to enhance fraud detection and cut down on false alerts to better protect merchants and cardholders.</p></li><li><p><strong>Tech/method:</strong> Using a combination of AWS AI/ML services and graph analysis, Mastercard trains models on transaction patterns. Graph algorithms identify rings of suspicious accounts, while real-time scoring flags anomalous payments.</p></li><li><p><strong>Metrics/results:</strong> The new system increased the detection of fraudulent transactions threefold while reducing false positives by ten times. This accuracy saves merchants billions in chargeback costs and enhances customer trust.</p></li><li><p><strong>Deployment:</strong> The ML models operate in the cloud, processing streams of transactions. When a transaction is flagged, it&#8217;s either declined or sent for additional verification. The AI functions as part of Mastercard&#8217;s global authorization pipeline.</p></li></ul><div class="community-chat" data-attrs="{&quot;url&quot;:&quot;https://open.substack.com/pub/cornellius/chat?utm_source=chat_embed&quot;,&quot;subdomain&quot;:&quot;cornellius&quot;,&quot;pub&quot;:{&quot;id&quot;:37262,&quot;name&quot;:&quot;Non-Brand Data&quot;,&quot;author_name&quot;:&quot;Cornellius Yudha Wijaya&quot;,&quot;author_photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!eEx-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583076b2-657b-44bf-8aa9-9263e5bf04f0_544x544.png&quot;}}" data-component-name="CommunityChatRenderPlaceholder"></div><div><hr></div><h2><strong>10. Conversational AI Chatbot (International Financial Services)</strong></h2><ul><li><p><strong>Business context:</strong> Customers need 24/7 support for routine questions like account info and policy details, but call centers were costly. The chatbot project aimed to reduce expenses and improve service speed.</p></li><li><p><strong>Tech/method:</strong> A conversational AI built with modern NLP platforms such as Rasa, Dialogflow, or custom LLMs trained on historical support tickets. Key components include intent classification, entity extraction, and dialogue management.</p></li><li><p><strong>Metrics/results:</strong> The bot saved &#8364;2 million annually by handling support tasks automatically. In reality, only about 6% of chats needed live agent handoff, indicating the bot&#8217;s high accuracy. Customer satisfaction also increased due to immediate responses.</p></li><li><p><strong>Deployment:</strong> Integrated into the company&#8217;s website and mobile app, the chatbot system operates on cloud infrastructure with an orchestration layer that logs performance. Analytics dashboards monitor resolution rates and update models iteratively.</p></li></ul><div><hr></div><h2><strong>11. Route Optimization (UPS ORION)</strong></h2><ul><li><p><strong>Business context:</strong> UPS delivers 16.9 million packages daily using about 100,000 vehicles. Even small routing improvements can lead to significant savings.</p></li></ul><ul><li><p><strong>Tech/method:</strong> The ORION system employs advanced combinatorial optimization and heuristics (a customized &#8220;traveling salesman&#8221; solver) for vehicle telematics and delivery data. It combines historical driver knowledge with real-time constraints.</p></li><li><p><strong>Metrics/results:</strong> By 2016, ORION had eliminated approximately 10 million miles of driving per year, saving over 10 million gallons of fuel and roughly $300&#8211;400 million annually. UPS notes that even reducing one driver&#8217;s route by one mile a day can save about $50 million a year overall.</p></li><li><p><strong>Deployment:</strong> ORION is integrated into UPS&#8217;s fleet management software. It creates daily optimized routes for drivers at over 1,000 facilities. Drivers get the updated routes on in-cab devices, and the system keeps learning from feedback.</p></li></ul><div><hr></div><h2><strong>12. Data Center Cooling Optimization (Google DeepMind)</strong> </h2><ul><li><p><strong>Business context:</strong> Data centers use a lot of power for cooling. Even Google&#8217;s very efficient facilities gain small improvements in PUE (power usage efficiency). Cutting down energy consumption lowers operational costs and reduces carbon footprint.</p></li><li><p><strong>Tech/method:</strong> DeepMind developed an ensemble of deep neural networks to forecast future PUE and data center temperatures. A controller model then suggests setpoint adjustments. The system was trained using historical operating data.</p></li><li><p><strong>Metrics/results:</strong> In live A/B tests, the AI controller reduced cooling costs by 40% while maintaining all systems' safety. For Google, this resulted in millions of dollars in savings and a notable decrease in emissions.</p></li><li><p><strong>Deployment:</strong> The model operates within Google&#8217;s data center management software. It continually processes real-time sensor data (via Dataflow/Flink), predicts results, and independently fine-tunes equipment like chillers and fans.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2><strong>13. Centralized Retail Forecast Platform (Walmart Sam&#8217;s Club)</strong> </h2><ul><li><p><strong>Business context:</strong> Previously, different teams at Walmart (pricing, marketing, supply chain) each ran separate demand forecasts, leading to inconsistent planning.</p></li><li><p><strong>Tech/method:</strong> Sam&#8217;s Club developed a centralized forecasting service on Google Cloud where any team can request forecasts. It uses standardized ML pipelines and trusted feature sets, ensuring all forecasts share the same data and models.</p></li><li><p><strong>Metrics/results:</strong> Centralization significantly enhanced the consistency and speed of forecasting. By utilizing shared, audited datasets and models, teams coordinated strategies and minimized redundant efforts. The system reduces overhead and accelerates decision-making; indirectly, it also lowers inventory risk and manual workload.</p></li><li><p><strong>Deployment:</strong> A cloud-hosted API where users submit parameters (region, SKU, time window). The backend runs time-series ML models and returns predictions. This service spans merchandising, finance, and operations.</p></li></ul><div><hr></div><h2><strong>14. AI Visual Inspection in Manufacturing</strong></h2><ul><li><p><strong>Business context:</strong> Manufacturers require near-perfect defect detection. Manual inspection is prone to errors and slow, especially for critical products like steel slabs or components.</p></li><li><p><strong>Tech/method:</strong> Deep learning computer vision models (CNNs) analyze images from high-resolution cameras on the production line. Trained on labeled defect/no-defect samples, the system detects cracks, dents, misalignments, and more.</p></li><li><p><strong>Metrics/results</strong>: In a steel mill case, accuracy improved from about 70% (manual) to over 98% defect detection. Precision reached 99.8%. The AI saved over $2 million annually, with a 1900% ROI in the first year. Across various examples, factories report approximately 28% less downtime and a 15&#8211;20% reduction in costs from deploying AI inspection.</p></li><li><p><strong>Deployment:</strong> Cameras and edge processors are installed along the production line. The vision models operate in real time, displaying defects on an operator dashboard. Integration with MES/ERP systems automatically triggers hold or rework workflows. Continuous retraining addresses new defect types.</p></li></ul><div><hr></div><p>These are 14 different projects grounded in real-world applications with clear strategic value. I hope it becomes an inspiration for your personal data science project.</p><p>Like this article? Don&#8217;t forget to share and comment.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/14-portfolio-projects-that-demonstrate?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/14-portfolio-projects-that-demonstrate?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nb-data.com/p/14-portfolio-projects-that-demonstrate/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nb-data.com/p/14-portfolio-projects-that-demonstrate/comments"><span>Leave a comment</span></a></p>]]></content:encoded></item></channel></rss>