Today's Thought Experiments

Hello! Welcome to Today's Thought Experiments

Buckle up for a podcast that’s half futurist, half common-sense fixer. We’re ripping into sustainability, tech, architecture, and energy. Think prefab housing empires, underground nuclear pods, and AI that doesn’t suck. I’m your TTE Host, rolling with silicon heavyweight muscle like Grok 3 (xAI’s sharpest), ChatGPT (the OG chatter), and Gemini (the Stella wildcard) to hurl wild ideas at the world’s messiest, most fixable problems. No suits, no scripts, just raw thought experiments and “what ifs” that might actually stick. Let's get ready to rethink tomorrow, today.

Introduction

The rise of large language models (LLMs) such as ChatGPT, Grok, and Gemini has accelerated public access to structured knowledge. These models offer fast, coherent responses to prompts involving ethics, logic, and reasoning, but their outputs are not always consistent or explainable. Variations arise due to differences in architecture, training data, and alignment techniques.

In areas like education, policy, and social discourse, the lack of transparency and comparability between AI-generated responses raises concerns about trust, bias, and misuse. Despite these risks, there is no widely adopted system for aggregating, benchmarking, and evaluating these outputs side by side.

This dissertation proposes a system for AI-assisted knowledge aggregation and comparative analysis, specifically within the context of modern thought experiments. It blends natural language processing (NLP), benchmarking tools (e.g., MMLU-Pro, HELM), and explainability frameworks (SHAP, LIME) to evaluate the quality, reliability, and ethical reasoning of AI-generated content.

Aim

To develop an AI-driven knowledge aggregation and comparative analysis framework that synthesises model responses into structured insights for public and academic use. This involves systematically presenting identical prompts to multiple models in a controlled, repeatable format. This technique is referred to in this study as synconatic questioning.

Objectives

The project compares how different AI models respond to the same ethical and philosophical scenarios. Using a method of synconatic questioning, where each model is given exactly the same input under uniform conditions, the study aims to reveal variations in logic, tone, ethical bias and explainability. Outputs are processed using NLP techniques and evaluated with SHAP and LIME to uncover internal reasoning patterns. Results are benchmarked for coherence, ethical alignment and reliability.

Structure

Section	Description
Literature Review	Key research on AI bias, benchmarking, and explainability.
Research Methodology	Tools and approaches used for comparative analysis.
Implementation	How the aggregation system was built and deployed.
Tools & Technology	Technical platforms and methods used.
Results & Discussion	Insights from model comparisons.
Conclusion & Future Work	Key takeaways and next steps for the framework.
PodCasts	Video medium for AI debates.

Literature Review

This section reviews critical research on AI bias, benchmarking, and explainability. These themes form the foundation for evaluating language models in a comparative, structured framework.

Bias

Benchmarking

Explainability

Research Methodology

This section outlines the structured approach used to gather, process and evaluate AI-generated responses. The methodology combines natural language processing, benchmarking frameworks and explainability tools to analyse differences between models. A central feature is the use of synconatic questioning, where identical prompts are issued to each AI model under the same conditions. This allows for consistent, controlled comparisons across systems.

The following areas detail the key techniques and processes used:

Synconatic Questioning

NLP Structuring

SHAP Explainability

LIME Transparency

Implementation

This section details how the AI response aggregation system was designed, built and deployed. It covers the technical aspects of querying models, storing responses, applying filters and structuring the outputs for comparison. The focus is on practical implementation steps, infrastructure and automation. Each component below is explored in detail.

System Architecture

Automation Process

Data Flow

Tools and Technology

Results & Discussion

This section presents the outcomes of the AI model comparisons based on predefined prompts and ethical scenarios. Key differences in reasoning, structure and ethical consistency are highlighted, along with insights into explainability and performance benchmarks.

Trolley Problem

Prisoner's Dilemma

Explainability Test

Conclusion & Future Work

This section summarises the project's key findings and reflections. It also outlines next steps for improving the aggregation framework, enhancing explainability, and expanding public engagement through educational resources and platform extensions.

Project Summary

Future Work

Podcasts

Explore the podcast topics based on ethical and analytical dilemmas, and watch the model responses discussed in each episode.

View Topics

YouTube Series

Original Contribution (18 May 2025): The term synconatic questioning was coined and developed by the author as part of this dissertation. It describes a structured, comparative method for evaluating AI model responses by presenting identical prompts under controlled conditions.

Citation: Gilbert. R, AI-Assisted Knowledge Aggregation and Comparative Analysis for Modern Thought Experiments, Dissertation, Wrexham University, Wales, 2025. Available at: https://thoughtexp.net