This bit is curious:
I apologize for the mistake in my previous response. Upon reviewing the datasheets for the MAX4936 and MAX4937
As I understand it, ChatGPT does not have internet access beyond being able to chat with you. Therefore it did not "review the datasheets". Its apparent self-awareness is no more reliable than its factual reliability.
Yep! That's something that I wrote in my original writeup:
Even when it claims to do so, [ChatGPT] doesn’t consult a datasheet or look up information — it’s not even connected to the internet! Therefore, what seems like “reasoning” is really pattern recognition and extrapolation, providing what is most likely to be the case based on training data. This explains its failures in well-defined problem spaces: statistically likely extrapolation becomes wholly untrue when conditioned on narrow queries.
My last comment about "self-awareness seems to be 100%" was a (perhaps non-obvious) joke; mainly that at least it is trained to recommend that it shouldn't be trusted blindly. But even this is a conclusion that isn't arrived at via "awareness" or "reasoning" in the traditional sense — again, it's just training data and machine learning.
I've been doing similar things with my day-to-day work like making stuff in CSS/Bootstrap or Excel, and my hobbies like mucking about in Twine or VCV Rack, and have noticed:
However, if you treat it almost like a student, and inform it of the errors/consequences of whatever it suggested, it's often surprisingly good at correcting the error, but here is where differences between how much it "understands" domains like "CSS" vs. "Twine's Harlowe 3.3.4 macro format" become easier to see- it seems much more likely to make up function and features of Harlowe that resemble things from more popular languages.
For whatever reason, it's really fun to engage it on things you have expertise in and correct it and/or rubber duck off of it. It gives you a weird child of expertise and outsider art.
This is a supplement to an article I wrote elsewhere, to provide the complete transcript of an interaction with ChatGPT.
I evaluated ChatGPT's performance by tasking it with real-world problems I encounter in my work as an applications engineer.
Main takeaway: ChatGPT seems to perform well in general troubleshooting scenarios, even in domains which require considerable background knowledge. Failures occur with narrow queries that require specific technical data.
General Troubleshooting
ChatGPT can offer suggestions to solve common engineering issues, even regarding specific integrated circuit parts. As long as the solution space is large, it generates good recommendations for diagnosing and solving problems.
Narrow Prompts
Defined issues requiring a specific solution generate believable, in-depth answers which fail in their specificity. These errors are difficult to spot because the text conforms to what we expect from a truly intelligent response. (See commentary between prompts for the errors.)
Errors: Operating voltages and output currents are consistent between the two parts. Also, both devices are TQFN packages (not SOT23 or SOIC), and have 56 pins (not 5 or 8).
Another datasheet-related question on a different part, just for good measure:
Error: The maximum data rate is 0.5 Mbps, not 1 Gbps.
Fabricating References
I hoped that I could ask ChatGPT for references which would allow me to decide whether I could trust its output. This failed spectacularly.
Actually, the datasheet does not contain this information. Additionally, when I inform ChatGPT of the error, it updates its guess to a new assumption of the wiring.
* Note: The link has since changed, but at the time, this was a valid link to the DS1402D datasheet.
Meta-Questions of Information Validity
We already know why ChatGPT hallucinates, but I wanted to see what reasons it would provide:
It may be factually unreliable, but its self-awareness seems to be 100%.