Background: This is a general overview of AI safety applied to astronomy, created as a part of the AI Safety North Alignment course 2023. It is intended as a simple overview for people without intermediate AI knowledge. The contents of this text should be seen as a suggestion for how AI safety could be more clearly defined in research, from the perspective of a student with minimal experience in academia or AI research.     



My reason for publishing this on the LessWrong forum is to get some feedback from others if my reasoning makes sense. I wanted to explore the idea of a better standard in research in relation to AI safety. Personally I feel like finding how the researchers mitigated risk from using AI shouldn't be so complicated. I am a physics student and do not have intermediate knowledge about AI, but learned a bit during the AI Safety North Alignment course. Reading articles in the field of astronomy where AI was used, I found that the explanation of how the risks were mitigated were confusing. One could argue that for advanced research you need to have knowledge from the field the research is from to understand it. In my opinion, since more fields are starting to use AI, the most rational approach would be for the explanation to be more straightforward, so that even a person who doesn't know astronomy so well can understand it. My goal is to try an argue that researchers when using AI for research in astronomy should be more straightforward about how they mitigated risks/errors, and I will do this by applying AI safety concepts. I am not sure if my reasoning makes sense or if this already is common in other fields, but an attempt was made. 

Some applications of AI in astronomy

Astronomers have been using AI for decades, such as in 1990 when astronomers from the University of Arizona were among the first to use a neural network, a type of machine learning algorithm model taking inspiration from the brain, to study the shapes of galaxies. Since then, AI has spread into every field of astronomy, and for example generative AI was used to make the first image of a black hole two times sharper. As telescopes improve the amount of data collected increases, making AI algorithms the most viable solution to ever hope going through the data (Impey, 2023).

It is also useful in cosmology, a sub-field of astronomy. Cosmology involves taking massive surveys that are some of the largest collaborations in the world and reducing the data to get simple statistical summaries. These are used to derive quantities of interest, such as the expansion rate of the universe or the amount of dark matter. An issue in cosmology is that even though a lot of resources are spent on giant surveys, most of the information is missing. This is because the most valuable information is at the smallest scales that are rife with non-cosmological pollution, perhaps information that is valuable for an astrophysicist but useless for a cosmologist. Retrieving the ideal data would demand a lot of human resources. AI astronomers can be used to retrieve cosmological information at these small scales (Sutter, 2022). Figure 1 shows the result of Li et al. (2021) in developing AI-assisted superresolution cosmological simulations.

Figure 1: Low resolution simulation of a large volume of the universe, high resolution simulation of small bits of the cosmos, and a combination of the two made using ML. Figure retrieved from Sutter (2022).

Other algorithms such as neural networks have started to be more commonly used to classify astronomical objects. Some algorithms that classify galaxies have reached an accuracy of 98%, and 96% accuracy has been reached by algorithms that look for exoplanets (Impey, 2023). Some astronomers also use it to find theorized objects or phenomena, such as trained AI converting theoretical models into observational data with realistic noise levels added. Figure 2 shows a number of publications containing the keywords astronomy or astrophysics, as well as AI algorithms. AI has grown as an important tool for responding to the complexities of modern research in the field.

Figure 2: Publications related to keywords astronomy or astrophysics, with AI algorithm topics (Rodríguez et al., 2022).

But it is important to first teach the AI to look for the right information. One method that cosmologist Navarro and collaborators use are several simulations with varying cosmological parameters that incorporate all knowledge possible of physics. To match these simulations with surveys is a tedious task for humans, but a simple one for a convolutional neural network, a type of AI that specializes in identifying subtle patterns (Sutter, 2022). However, they did raise the possible risk such as the AI including patterns that aren’t real, which makes it extra important to match its results with prior surveys to improve its viability (Sutter, 2022). It also sheds light on the possible risk of using AI for research.

Associated risks 

There are several associated risks with using AI. Relying too much on AI algorithms could cause loss of human expertise. Something that would have detrimental consequences in case of misclassification. Many modern ML systems use deep neural networks that look for patterns and work well with large amounts of data. However, these are only as good as the data they have been trained on. There also is the fact that AI models can struggle to generalize when encountering unexpected scenarios. 

Making the AI do something is also not as straightforward as it might seem. Some AI systems hate to have their goal changed, so they might learn to deceive humans to keep its goal. When asked they might answer that they follow your goal, but will ultimately pursue their own (Park et al., 2023). There is also the problem of aligning the AI correctly to avoid biases. It might exploit imperfections in our data to finish its task quicker, a tendency called reward hacking (“AI alignment”, 2023). For example, that some researchers prefer rounding up to three decimals might affect the results. Or, if you develop your model to compute as fast as possible it might approximate easily and avoid checking if the results make sense.

The fact that it takes a lot of work to understand the AI and locate subtle errors, defines its complexity. There are most likely several more risks that can be raised. In order to try to group the possible risks we can turn to AI safety for help.

Adapting concepts from AI safety

The application of AI is useful in several fields, but does not come with safety guarantees. To avoid such risks the field of research called AI safety seeks to focus on technical solutions to ensure safety and reliability (Rudner & Toner, 2021). Major companies have taken notice, governments too, but it is not sufficient. The question lies with using alignment to make AI systems safer, where it would require public debate on how it must be aligned. Lazar & Nelson (2023) emphasizes that empirical research must be prioritized on the AI system and the risks they pose. Rather than designing mitigation strategies for systems that do not yet exist (Lazar & Nelson, 2023). Relating to astronomy, making a general risk prevention strategy would be insufficient, and every system would require its own evaluation. Especially since new methods are constantly explored, making the AI unique depending on its purpose. AI safety states three main categories:

·       Robustness: System operates within safe limits even in unfamiliar settings.

·       Assurance: Easily understood by human operators.

·       Specification: Ensuring behavior aligns with creators’ intent. 

Researchers are actively finding ways to help abide by these (Rudner & Toner, 2021), and figure 3 illustrates a way to understand their worth. 

Figure 3: The figure shows the swiss cheese model for approaching safety research. Systemic safety refers to cybersecurity and decision making. Monitoring refers to allowing good assurance, and alignment refers to specification. It isn’t sufficient to focus on a ‘single slice’, but focusing on all the categories mitigates the most risk. Figure retrieved from Hendrycks et al. (2022).

Usually in physics papers where codes are utilized it is sufficient to refer to the RMS errors somewhere in the article, and one could argue that the same should apply for using AI. However, in my opinion the complexities of AI require a more thorough focus on AI safety. Some projects to develop useful simulations such as the CAMELS project do include errors (Villaescusa-Navarro et al., 2021). However, in my opinion the explanation of AI safety should be much clearer. It would be prudent to develop a sort of framework in accordance with the three main problems.

It is important to note that these three groups are complex to define, and are topics under active research. What I am aiming for is to understand them to get some idea how researchers could choose to define them in their article. Or at least state how they try to improve AI safety. 


Developing mitigation strategies

All measurements have errors, and these need to be quantified so that different experiments can be compared. There are two types of errors: systematic errors from experimental technique or random error from unpredictable variations in data. The AI astronomer relates to systemic errors, ergo if someone's dataset includes random error it should still be separately defined. In science the terms precision and accuracy have a specific meaning. Precision refers to the consistency of the results, and accuracy refers to how close it is to a “true value” (Cicone, 2021). Therefore, even though the AI astronomers find precise results, the accuracy could be way off when compared with human astronomers. Although, this could be corrected for, but when there is nothing to compare it to it is difficult to determine the accuracy of the results when using an AI. The best way to evaluate the systematic errors is to understand them.


As previously mentioned, robustness is that the system operates within safe limits even in unfamiliar settings. It basically means understanding how the AI astronomer acts when confused. For example, if it is trying to model a foreign solar system when an asteroid traveling through the system appears, it could accidentally model it as another planet. Confusion may also arise from uncorrected noise or aberrations. There could also be challenges because of poor training data or bias. One example of robustness failing was in 2010 when automated trading systems wiped off a trillion dollars worth of stock due to market aberrations (Hendrycks et al., 2022). One subproblem of robustness is goal misgeneralization. Goal misgeneralisation refers to the AI generalizing in an undesirable way, even though the specifications may be correct. Meaning good performance in training does not equate good performance in practical use (Shah et al., 2022). For example, illustrating the challenge of distributional shift, the AI could learn to avoid lava, but in a test situation fail to generalize and walk straight into the lava (Ortega et al., 2018). There are several ways to go about ensuring good robustness:

  • An obvious direction to mitigate would be stress testing the AI astronomer to see how it reacts to unforeseen scenarios (Hendrycks et al., 2022), such as more diverse training data (Shah et al., 2022). 
  • Another active area of research to aid robustness is training ML models to estimate uncertainty of measurements and alert a human operator when the uncertainty is too high (Rudner & Toner, 2021). 
  • Some research such as Guo et al. (2023) even suggest their own evaluation framework for deep neural networks. 
  • In the CAMELS project analytic functions are used to extrapolate instead of neural networks to avoid misgeneralisation all together (Villaescusa-Navarro et al., 2021). 
  • A specific method to improve robustness and accuracy is adversarial training procedures. A way to improve AI safety in high stakes scenarios (Ziegler et al., 2022).    etc. 

Again, due to lack of standardization it is difficult to define the most appropriate one. The ideal strategy would be to follow the prior mentioned idea of the evaluation method being more individual. The article should include the method(s) used to evaluate the robustness of the system, and possible parameter values. Why this method is best for the job should also be explained, so every AI method is evaluated according to how the researcher finds appropriate. It would be unnecessary to list the simple methods, but one should (at least when using complex AI) always seek to improve robustness. Therefore the robustness technique with most risk to affect the results should be mentioned. This evaluation method should also be scrutinized in peer review, gradually improving the knowledge of AI safety. A unified approach should be discouraged, and the researchers find their own encouraged. 



Assurance means that we want the actions of the AI astronomer to be easily understood by human operators. This is important to uncover possible bugs or understand why it fails. It should follow our understanding of physics and report to us if it found something odd. Understanding how AI models work is much more complicated than it might seem, currently most models are inscrutable black boxes. One way to go about is to ensure monitoring through interpretability of the AI (Ortega et al., 2018). There is for example mechanistic interpretability which seeks to study the reverse-engineering of neural networks (Nanda, 2023). It could also be possible for the AI system to explain its reasoning together with the results, to understand it better (Ortega et al., 2018). Interpretability techniques do not find conclusions but hypotheses, it is evaluating the uncertainty of these hypotheses that is important. The evaluations should not only focus on best case performance, but also worst case (Räuker et al., 2023). Making the AI find the confidence of its results could also have some problems since it could have a tendency to be overconfident in its results. A part of improving the assurance is therefore also confidence calibration. For example, calibrating modern neural networks as explained by Guo et al. (2017). 

Assurance is complicated to define, but some ways could be to refer to where the assurance is evaluated. It could be sufficient to refer to other articles or at least mention that investigating the assurance is unnecessary if the model is simple. 


Specification is ensuring behavior aligns with creators’ intent. We want our results to have good precision and accuracy according to our goal. In relation to astronomy the difference between robustness and specification might be confusing. Robustness pertains to how the AI reacts to unforeseen events, and specification is about precisely defining desired objectives and behavior. In some cases it is difficult to find the right specification. For example, exoplanets are found from their Sun's spectra since they cause periodic dips when they orbit it. It would be prudent to define some criterias or else the AI astronomers goal might be to just find all the dips in spectra without periodic behavior. Even if they followed this criteria it might not be exoplanets but perhaps asteroids or moons, so a good specification is important in order to minimize error margin. What we seek when evaluating assurance are specific parameters such as the input and output, goals, constraints and rules, accuracy, etc. There are three types of specifications: 

  • Ideal specification: Our ideal desire.
  • Design specification: The specification that we actually use.
  • Revealed specification: What actually happens. 

We have a specification problem if there is a mismatch between the ideal and revealed specification (Ortega et al., 2018).

Some ideas of how to evaluate the specification could be to ask the AI for the goal or make it focus on specific parameters. For example, the CAMELS project whose ideal specification is to provide theory prediction for given observables, defines their design specification as wanting the neural networks to approximate a function depending on input parameters (Villaescusa-Navarro et al., 2021). It would therefore be useful to define the ideal- and design specification for the AI astronomer at the beginning. Then at the end evaluating the revealed specifications. This is of course much more complicated then it sounds, but I won't be going into the details here. What is important is that you somehow try to define the specification with whatever method you wish or that is common in the specific field. In astronomy knowing the accuracy can be a good minimum requirement to avoid complexity. 

Ethical implications

All models should be empirically evaluated, and if you use someone else's models they should be referenced. Researchers should be transparent with how their data and models came to be, and take responsibility in case of errors. There could also be some human bias where we often like simple pretty equations/data, and that could affect our research. AI has proven to be an useful tool for modern day astronomy, but we shouldnt be overconfident in its abilities if we get the results of our dreams.

A framework 

Although using AI safety concepts to clearly explain how risks where mitigated is nice, in practice it is very complicated to define. Some projects could be able to explain the methods used to improve robustness in one sentence, while others would need a whole other article. How complex or detailed the discussion of AI safety is depends on the research.

My point in this article is not to find proper ways to define robustness, specification, and assurance, but that researchers should at least try to define them in their article if they used AI. It could perhaps be enough to have at few sentences perhaps in the discussion part (or introduction) of your article where you write something about how you tried to mitigate risks from using AI for research. It could also be useful to first refer to the confidence in the general AI safety levels for similar research in the topics. Making it unnecessary to thoroughly explain the AI safety in well tested algorithms. 

Advanced research will of course have to give a more complicated answer, but according to me that information should still be easily found. If the assurance is unnecessary to explain since it already has been thoroughly looked into by other researchers, write that down and reference the article. If you wrote a separate article to test the robustness, reference it. If specification is explained more thoroughly somewhere else in your article, reference to that part. I understand how complex this is vary depending on what the research is focused on, but my point is that all this information should be gathered in one place. Even if in your opinion it is poorly explained, an overview or attempt should be made.



To summarize my point is that in order to mitigate the risk of using AI for research, how the risks were mitigated should be easily found. AI is gradually becoming more common to use for research in astronomy, and due to its nature researchers should be more conscious of its risks. Researchers should in my opinion write at least something about how they tried to evaluate the robustness, assurance, and specification in their article. So that if anyone is interested they can go to one place in the article to get an overview of where that information can be found. And if they haven’t personally tested it, reference where it already has been done or why in your opinion it is unnecessary. This way the knowledge of AI safety could spread to other fields and a more safer culture could be developed for using AI for research. Researchers could more easily understand how several different algorithms are evaluated and easily compare in peer review if someone got different answers but used different evaluation techniques. 



  1. Impey, C. (2023). Analysis: How AI is helping astronomers study the universe. Retrieved 18.11.23 from:
  2. Sutter, P. (2022, 17. February). Saving cosmology with AI.  Retrieved 17.02.2022 from:
  3. Rodríguez, J.-V.,  Rodríguez-Rodríguez, I., &  Woo, W. L. (2022).  On the application of machine learning in astronomy and astrophysics: A text-mining-based scientometric analysis. WIREs Data Mining and Knowledge Discovery,  12(5), e1476. 
  4. Lazar, S. & Nelson, A. (2023). AI safety on whose terms? Science 381 (6654).
  5. Rudner, T. G. J. & Toner, H. (2021). Key Concepts in AI Safety: An Overview. Retrieved 21.11.23 from:
  6. Cicone, C. (2021) Appendix B: Error Analysis. From lecture notes in the course AST2210.
  7. Shah, R., Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J., & Kenton, Z. (2022). Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals. arXiv preprint arXiv:2210.01790.
  8. Hendrycks, D., Carlini, N., Schulman, J., & Steinhardt, J. (2022). Unsolved Problems in ML Safety. arXiv preprint arXiv:2109.13916.
  9. Guo, J., Bao, W., Wang, J., Ma, Y., Gao, X., Xiao, G., Liu, A., Dong, J., Liu, X., & Wu, W. (2023). A comprehensive evaluation framework for deep model robustness. Pattern Recognition, 137, 109308.
  10. Villaescusa-Navarro, F., Anglés-Alcázar, D., Genel, S., Spergel, D. N., Somerville, R. S., Dave, R., Pillepich, A., Hernquist, L., Nelson, D., Torrey, P., Narayanan, D., Li, Y., Philcox, O., La Torre, V., Delgado, A. M., Ho, S., Hassan, S., Burkhart, B., Wadekar, D., Battaglia, N., Contardo, G., & Bryan, G. L. (2021). The CAMELS Project: Cosmology and Astrophysics with Machine-learning Simulations. The Astrophysical Journal, 915(1), 71.
  11. Ziegler, D. M., Nix, S., Chan, L., Bauman, T., Schmidt-Nielsen, P., Lin, T., Scherlis, A., Nabeshima, N., Weinstein-Raun, B., de Haas, D., Shlegeris, B., & Thomas, N. (2022). Adversarial Training for High-Stakes Reliability. arXiv preprint arXiv:2205.01663.
  12. Nanda, N. (2023). Mechanistic interoperability quickstart guide. Retrieved 28.11.23 from: 
  13. Räuker, T., Ho, A., Casper, S., & Hadfield-Menell, D. (2023). Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. arXiv preprint arXiv:2207.13243.
  14. Ortega, P. A., Maini, V. & the DeepMind safety team. (2018). Building safe artificial intelligence: specification, robustness, and assurance. Retrieved 28.11.23 from: 
  15. Park, P. S., Goldstein, S., O'Gara, A., Chen, M., & Hendrycks, D. (2023). AI Deception: A Survey of Examples, Risks, and Potential Solutions. arXiv preprint arXiv:2308.14752.
  16. Wikipedia contributors. (2023, November 28). AI alignment. In Wikipedia, The Free Encyclopedia. Retrieved 13:55, November 29, 2023, from
  17. Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. arXiv preprint arXiv:1706.04599.

New Comment