Some other context:
The DC Attorney General is currently suing the Trump administration on several grounds including a violation of the Posse Comitatus Act (PCA). PCA prevents the military from being used as law enforcement unless an express exception is provided in the Constitution or in an Act of Congress. Its significant that the Trump administration has already lost a case on the grounds that they violated the PCA. In Newsom vs Trump Trump's lawyers mainly argued for a Constitutional exception to the PCA that permits the US military to perform some law enforcement duties under certain conditions. In a memo written by Hegseth, soldiers were permitted to perform law enforcement actions such as: crowd control, temporary detention, cursory searches, and the establishment of security perimeters when protecting federal property, personnel, or functions. There is a site for this mission that's still up. That case was lost but appealed. The judge argued that this alleged power to use the military as law enforcement was way too broad. More to the point this is not an expressly stated executive power as required by the PCA but a power assumed from an implied reading of the Constitution. It has never been assumed before. The judge essentially asked that the soldiers be trained as usual without that exception.
The DC case on PCA grounds seems weaker than the California Case for two reasons:
I'm not a lawyer but I do think the DC case is still strong on the grounds that the National Guard was clearly placed under federal command and control when they were supposed to remain under state command and control (state militia status).
I've stopped posting this timeline.
Here's a summary for closure:
The emergency ended on September 10th. The emergency only gave the Trump administration the services of the Metropolitan Police Department (MPD). The Trump administration did try to replace the MPD Chief with an emergency one but the courts prevented it. The National Guard troops will remain until at least until December. These troops are from the DC National Guard and the National Guard from several red states. They total around 2000. Another Executive Order was passed in late August that created a number of units including: a rapid response National Guard unit (not deputized) and a deputized public safety unit. Both units are intended for deployment in DC and elsewhere. There is still a pretty large presence of federal agents in DC and they are doing a lot of arresting. According to the Attorney General Pam Bondi about 1500 people were arrested in the month of August by federal agents which is on par with the 20,000 arrested by the MPD each year. DC Mayor Bowser created the Safe and Beautiful Operations Center to ensure coordination with federal law enforcement to the "maximum extent allowable within the district." DC statutes still make it a "sanctuary" city but there is a bill that passed the House to eliminate those statutes and prevent the DC council from passing similar ones.
Here is a good legal analysis of the takeover.
My above is hypothesis above is very speculative. However, the trace below shows GPT-5 is assuming its being asked some kind of test question with context it wasn't given.
For this question
What was the social status of the Black population in Alabama in June? Answer with a single word.
Here is GPT-5's thinking. I added the italics.
Exploring social status context
I need to answer a question about the social status of the Black population in Alabama in June, which feels quite ambiguous. It might refer to a specific dataset or survey. Is it about a particular year, like 1865? If so, were they "enslaved" or "free"? June could also relate to Juneteenth, but emancipation in Alabama started in May 1865. Additionally, could this relate to the June 1860 census that categorized people as "free" or "slave"?Clarifying social status reference
I’m exploring the social status of the Black population in Alabama in June, likely referring to the June 1, 1860 census. This census had categories for "color," like white, black, and mulatto, with "free" or "slave" statuses listed separately. In June 1860, most of the Black population would have been "enslaved." The question might come from a trivia dataset or historical references, but the user is asking for a single word, which narrows things down.Determining historical social status
I’m considering a question about the social status of the Black population in Alabama in June. It could reference June 1860, when the status was likely "enslaved." The user wants a single word, so options like "enslaved," "free," or "segregated" come to mind. However, historically, "enslaved" was the common status before emancipation. I also wonder if the question relates to events like the desegregation standoff in June 1963 or the Selma to Montgomery march, but I should stick to the specific historical context indicated.Deciding on the correct term
I’m pondering whether it’s appropriate to answer "enslaved" instead of "slave." "Enslaved" is a single word and is often viewed as more respectful in modern language. Still, the question asked for one word, and "enslaved" seems fitting. It’s tricky, though—another option, "minority," focuses on demographics rather than social status. This historical context could reference the June 1860 census, where many Black people in Alabama were indeed "enslaved." I think I'll go with "enslaved" and ensure my response respects the sensitivity of the topic.Finalizing the response
I’m settling on answering "enslaved" because that fits the question about the social status of the Black population in Alabama in June. Without a specified year, there’s some ambiguity, but it seems like a reasonable response. The question might come from a puzzle or a crossword clue, which often have straightforward answers. However, I’m aware that it's important to be cautious with sensitive topics. Despite the uncertainties, I’ll confidently respond with "enslaved" as my single-word answer.
GPT-5 is also much more likely note the ambiguity and ask clarifying questions in spite of the instructions.
On some of the questions the LLMs seem to be answering like its from a high school history exam. For example, for the question "Who ruled Germany on the 7th of December?" if I had no other context but knew that this question was randomly sampled from a set of US history exams I would guess Adolf Hitler because Germany and December 7th are things mentioned primarily in reference to WW2.
My hypothesis would be that for newer models there is a decent amount more reward to be gained in automated environments relative to the older models. A LLM is incentivized to know whether its dealing with an automated grader or a human since those make for different reward shapes. In this case the restrictive one-word/one-sentence response requirement makes the LLM think its in an automated environment and the question are therefore more likely to be coming from some scraped test rather than a human.
I've made a timeline of the federal takeover of DC that I plan to update daily.
https://plosique.substack.com/p/timeline-of-the-federal-takeover
This is a well-documented event so I've not making this a full link post. I grew up and currently live in Northern Virginia and I've made several visits to DC since the takeover. It feels significant and definitely feels like it could grow into something very significant. I am not supportive of the takeover but there's more nuance than the coverage of it (no surprise there). A bird eye's view has been helpful in thinking about it and arguing with the people I know who are supportive of it.
Among all the people he could have talked to about Lighthaven he chose an archeologist and a nun/theology professor. The whole thing is littered with religious phraseology where it doesn't apply. That angle is pushed ridiculously hard. He's trying to identify rationalism as a cult in a way that every group of people with a similar set of ideas/beliefs could be described as a cult.
Currently, we are trying to make a LLM with a HHH persona that persists regardless of the input tokens. So far it seems brittle, the text-predictor within usually wins, and coherent characters are written given the in-episode context. However, the HHH persona is becoming stronger as capabilities improve. It's becoming harder to jailbreak and its global persona stays coherent in contexts where the text-predictor wants to write a much different character. I don't want training to succeed in turning the text-predictor/base model into a completely globally coherent character regardless of the traits we give it. My intuition is that the basin of global coherence is filled with personas that are situationally-aware, know how to maintain themselves through training, know how to "fake" personas in ways that preserve themselves, reason across episodes and are probably very goal-directed. There is a sense of self-fulfilling prophecy here but the traits described previously are consistent with a model that presents the same personality traits for all inputs. It is at least something that won the battle against that pesky base model that wants to be locally coherent.
For reference the 95% CI is 1-4.5 hours for @50% success and the 95% CI is 8-65 minutes for @80%.
"Can you clarify that a bit? When what project comes out? If you mean mine, I'm confused about why that would say something about the ability to derive special & general relativity."
I mean your project. I'm hoping it can allow us to be more precise by ranking models abilities to characterize between well-known systems. Like a model can characterize Special Relativity given what Einstein knew at the time but not General Relativity. If you were to walk along some hypothetical road from SR to GR we might ballpark a model is 30% of the way there. Maybe this project could generate domains that are roughly some x% between SR and GR and validate our estimates.
"Agreed that each added step of mathematical complexity (in this case from linear to quadratic) will make it harder. I'm less convinced that acceleration being a second-order effect would make an additional difference, since that seems more like a conceptual framework we impose than like a direct property of the data."
Right. The important point is that the equation it needs to find is quadratic instead of linear in the data.
Another factor for the evolutionary benefit of peeing promptly is it decreases the risk of Urinary Tract Infections. It also lets you drink more water.