Description
An instance of the Battery2030+ Excellence Seminar series featuring Simon Clark
Item | |
---|---|
Type(s)/Category(s) | Event |
Event | |
---|---|
Event series | |
Start date | 2025-03-27 |
End date | 2025-03-27 |
Duration | |
Location | |
URL | |
Organizer | |
Minutes taker | |
Project(s) | Battery 2030+ |
Associated OU(s) |
In this Battery2030+ Excellence Seminar, Dr. Simon Clark discusses role of semantic technology in accelerating battery research and innovation. He begins by highlighting the need for structured data in battery R&I. Then, he reviews existing semantic technology solutions from other domains and the challenges of extending them to materials science. Next, he shows some examples of how to bring battery data into the Semantic Web and what kinds of opportunities it opens. Finally, he closes with a look to the future.
Slides
The slides from the talk are available at the link below.
Transcript
Great, thank you very much for the introduction. It's an honour and a pleasure to be invited to speak here today.
So, yes, my name is Dr. Simon Clark, and today I will discuss how semantic technology accelerates battery research. And my talk is divided into four parts, and we'll begin at the beginning with part one, the knowledge paradox.
The Knowledge Paradox
So, when I was a kid, I think like many scientists, I used to love science fiction movies, especially Star Trek. And one thing that all science fiction universes seem to have in common is that in the future, humans can work directly with computers and get the answer to their question almost immediately. You know, if you've ever seen Star Trek, there's scenes where they'll say something like, "Computer, retrieve engine data and plot new course." And the computer does it just automatically and without question.
Would that we could, but live in such a world.
How nice would it be if we could just say, "Computer, retrieve the OCP curve for graphite." You know, none of this "Let me open Excel." None of this "I think that was on an email that somebody sent me a few weeks ago," but just instantly, at your fingertips, with the power of the computer, to know what to do with it and how to process it.
And, of course, we're closer to this reality today than we ever have been before, especially since the advent of chatbots and large language models and artificial intelligence like ChatGPT. And one thing that ChatGPT and large language models tend to be really, really good at is responding to queries about statistics from things like pop culture and finance and sports, because that was the majority of what they were trained on when they were trained on information from the Internet.
And so to test that and to kind of test how does that compare with the same functionality in battery science, I did a small experiment. I went to ChatGPT and I asked ChatGPT to plot for me the total goals that Wayne Gretzky, the great hockey player, scored over the course of his career. And ChatGPT gave me an answer that looks like this. I then went to the NHL database and retrieved the actual data to see how close ChatGPT was. And the answer is surprisingly close, almost exact. Really, really very impressive.
And I thought, "Well, you know, maybe that's a little too easy, because that's just retrieving one data point that probably exists somewhere that it's seen before. Let's have it do some actual processing." And so I said, "Tell me not only the total goals that Wayne Gretzky scored, but tell me what the average scorer of the top 50 in the NHL would do." And ChatGPT responded and said, "Well, you know, I don't have access to that data. It might not be accurate." I said, "Do your best shot." So it did, and it gave me a kind of linear approximation, which also turned out to be incredibly close. And so the responses are surprisingly accurate when we go at these topics that it tends to know a lot about, pop culture, finance, sports.
What about batteries? What about battery data? What if I asked it, instead of plotting me hockey goals, what if I asked it to synthesise a curve for the open circuit potential of graphite vs lithium that I can use in my P2D battery model? And ChatGPT responded, and it's not anywhere close. It's way off, compared with a kind of representative actual value. You know, I mean, it's less than 1 volt, so that's good. But pretty much everything else is way off.
And I asked ChatGPT, I said, "This is not really correct. Can you describe it in words? Can you describe it in text?" And when it responds, it gives a response that is pretty accurate in terms of text. It's a gently sloped, downward sloping curve that has different phases with different plateaus. And we can say that this is pretty accurate.
So what's going on? You know, why can ChatGPT respond to so many questions about sports and finance and pop culture, but struggles with a relatively simple thing like plotting the open circuit voltage profile for graphite?
And there are probably a few reasons, but one of them we can think about is this. Over the past few years, there has just been a wave of new research published on battery science. And we might look at this and say, "Well, on the one hand, this is a great thing. There's so much research being published, there's so much new knowledge and data, and we can use this, even though it's more information than any one human can really read."
If you were to read one paper a day, it would be less than 1% of what was published in 2024. Even though it's too much for humans to read, we now have the tools that we need to process large amounts of unstructured data and extract out some meaning from that in terms of AI agents and chatbots and all of this.
So on the one hand, we might look at that and say, "This is going to be great. This is exactly what we need to train machine-learning and AI agents, right?"
Maybe not, because whereas all of this research is being published, it's typically published as plain text in PDF formats, with the actual data embedded in figures, embedded in tables and not really made available in its raw form.
So when we look at the types of raw data that's actually made available and what machine-learning agents have available to learn from, at least in public literature, we see that there's generally a lack of structured semantic data that is published alongside of the article to facilitate machine actions.
I had a PhD student do a cursory literature search for graphite OCP curves that have been published in the literature and that contain the raw data and are published in an open format and are published under a permissive licence. And he did a little bit of looking, and even though this is obviously one of the most widely used anode materials in lithium-ion batteries, which is a very important topic generally, he only found six, six of those data sets that were published and were able to learn from.
So if we want to bring the power of machines into battery research, if we want to really leverage the opportunities that AI and the machine learning and machine processing bring, we need to get better about publishing data. And not only publishing data, but publishing it in a way that machines are able to find it, that they're able to understand it and that they're able to link it up with other concepts and other data sets.
So when we think about the knowledge paradox, we have all of this knowledge that's being generated, but very little of it is structured in a way that can be accessed by machines. So we can say that the bottleneck is not necessarily data generation, but it's knowledge extraction. And we need to give data structure, meaning and links to other pieces of information in order to get the best value from that.
Nothing New Under the Sun
Good. So that brings us to part two. Nothing new under the sun,
because we're not the first people who have faced this problem, right? That's not the first time, especially in a scientific domain, that we're faced with a lot of unstructured information, a lot of unstructured data, and we need to try to extract meaning from it, and we need to try to link it together so that it's easier to find and easier to operate on and easier to access.
And when it comes to solutions to that problem, there's one really big one that's been around for about 30 or 40 years, called the World Wide Web. Few people remember, and some people don't know at all, that the World Wide Web was actually an initiative that came out of CERN in Switzerland to manage research data. And it began in 1989 when a very young scientist, 34-year-old Tim Berners Lee, was working at CERN and kind of frustrated with the way that things were going, because they had so much information, so many teams and people and data and knowledge and information coming out and a lot of it's just getting lost in the mix because it's hard for any one person or any one system to really keep track of that all.
And so in his original proposal, he said, "Many of the discussions that we're having end with this question of, 'Yes, but how will we ever keep track of such a large project?'" And his proposal aimed to answer that question.
And what he originally called the Mesh, which was developed to manage research data at CERN, quickly morphed into what we know today as the World Wide Web and has become the most efficient way of sharing and disseminating and linking information that humans have really ever come up with.
But they quickly found out that there's a problem, because the original World Wide Web was just about linking documents, you know, HTML, Hypertext Markup Language. And that allows you to say, "Okay, this document is linked to that document, that document is linked to this other document." But it doesn't really tell you a lot about what those documents are, what's the content of those documents, what's the nature of their link between each other.
And so for that, they developed the next step of the World Wide Web, which is based on what we call semantic technology.
Now, semantic technology, when you hear the word semantic, I just want you to think meaning, which is what semantic means. But semantic technology is all about attaching meaning to data.
And there's a stack of tools and a stack of approaches that we can use to do that. At the bottom, the most foundational level is World Wide Web standards. So these were things that were developed in order to help bring semantics into the World Wide Web. And they set the kind of fundamental rules and grammar about how we express information semantically in the Web. And the fundamental unit of semantic technology in RDF and World Wide Web standards is called a triple. And a triple is very simple. It's just a three-positional statement that links two bits of information.
So an example might be to say, "Simon is a person," that's one triple. "Simon works at SINTEF," another triple. And even though each one of those individual bits of information is kind of not so helpful on its own, once you start to link them together, and once you start to bring together hundreds, thousands, millions of triples into a coherent network and a coherent graph, you're able to reason over them and find links between things and respond to abstract and complex questions.
So World Wide Web standards give us our foundational tools that we need to work with.
Next we have what are called controlled vocabularies, and those are collections of terms with a defined and controlled meaning. So if we think about triples like our sentences, "Simon is a person," controlled vocabularies give us the words that we need in order to create them. So what does "is a" mean? What does "works at" mean? Where can I find the definition of those terms? That's the kind of thing that you can find in controlled vocabularies, and they work kind of like a dictionary.
The next level up are ontologies, which is my favourite, and we'll hear a lot more about it later. An ontology is a rich resource that includes relationships between terms to formalise knowledge about a domain. And it's kind of like a textbook, right? We're trying to take knowledge that we have in our heads, knowledge that's been developed through research or innovation, and translate that into a machine-readable language.
One of the examples that I like to use to demonstrate that is the movie The Matrix, also a science fiction movie, maybe you know it. There's a scene where Keanu Reeves needs to learn every martial art in just a few minutes. And because his brain can be hooked up directly to a computer, they plug him in and they type a few lines, then he turns to the camera and says, "I know Kung Fu."
And that's basically an ontology. Somebody has taken all of the knowledge about Kung Fu and encoded it in a machine-readable language and uploaded it back into Keanu Reeves's brain.
So an ontology describes the knowledge about a domain, a knowledge graph extends that and starts to create actual instances of metadata and information, kind of like a social media network. And finally, the last step is actually tagging specific pieces of information into the knowledge graph, specific pieces of data that machines can work with, which is called linked data.
So I've already kind of given a description of what is an ontology. It's a formal description of knowledge about a domain. But what does it look like in practise?
Well, one of the most common ontology resources that's used to annotate most of the data on the web and one of the reasons that ChatGPT is so good at understanding things about pop culture and sports and finance is schema.org. And schema.org describes things that people tend to search. It was developed by Google and Yahoo and a few others.
And in schema.org, we might say, "Okay, we're going to create a class that is a person, and that person has a property that's, you know, his name, maybe they have a nationality that links them to a country. And if they're an athlete, like Wayne Gretzky, like we saw before, that he can be connected to a sports team that maybe has some team name.
So this kind of technology has been extremely well developed for things that people tend to Google.
But up until about five or six years ago, up until Battery 2030 and BIG-MAP started to really invest in trying to build this infrastructure, there wasn't anything similar for doing batteries and electrochemical knowledge expression, until we developed the Battery Interface Ontology.
BattINFO is basically the same kind of approach that schema.org takes for general things, but instead, specifically directed at electrochemical systems and batteries.
So an analogous example might be to say, "We have an electrochemical cell which has a positive electrode and a negative electrode, that is some electrode, and that electrode is in contact with an electrolyte, and it has an active material, that is a material."
And by starting to make these expressions formal and machine-readable, we open it up to machines being able to navigate graphs that are based on BattINFO and people being able to annotate their research data to bring it into the web and to make it accessible to semantic agents.
So the Semantic Web is based on this technology, and it's an extension of the original World Wide Web that's designed to be a web of data navigated by machines. So we may have some websites which are kind of indicated by our nodes here, connected to different data sources.
And the reason why ChatGPT knows so much about sports and the reason why it can respond to that query is because within the Semantic Web, we have markup that links the topic of Wayne Gretzky to ontology terms that say he is a person, that he plays for the Edmonton Oilers, that is a sports team, and that these things can be linked back to specific data sources and specific databases in order to train on and to learn from.
So this has been done very well in search engines. But what about science? What about other fields of knowledge? Has this been applied in ways that have been helpful? And the answer is yes.
Bioinformatics especially, was one of the early adopters of semantic technology and going all the way back into the 70s when they started to define common data formats and common reporting standards for proteins. And building on, on top of that, starting to digitise it in the 90s, starting to introduce semantic concepts and ontologies in the early 2000s. And then recently when Google DeepMind wanted to train an AI agent, a neural network, to predict protein structures, they then had 50 years worth of data to go back and build on.
And so that development, which won the Nobel Prize last year, was only made possible by decades of vision and good implementation of digitalization policy within that field. And I think it's something that we can look at and try to take some inspiration from.
But that being said, there's a reason, I think, why this has been difficult to bring to material science-based fields and that's because it's relatively easy to implement semantic technology and semantic resources for things that have a limited scope. Proteins, cool. Maybe tabular data, great. Data catalogues, got it. But material science is an extremely broad and diverse field. It's chemistry, it's physics, it's manufacturing, it's simulation, it's all of these things.
And we have to be able to bring these together in a well grounded, flexible and logically consistent framework so that when we do development on one topic, it's able to reuse terms from other domains and create stable links so that these don't become isolated, siloed little bits of knowledge, but really all contribute to one big mesh and one big map of material science knowledge.
And the risk is, of course, that if we fail to make that really stable and these things become siloed and they don't interact well together, that the links start to break down and a lot of the power falls out.
But the good news is that this is something that the EMMC has really been pushing and they have developed the EMMO, the Elementary Multiperspective Material Ontology, which serves as a top-level unifying framework for expressing knowledge and information about material science. And this is a huge undertaking that's really going to revolutionise the field of material science and digitalization.
And within battery science, within BIG-MAP and within Battery 2030+, we're building on top of EMMO to try to get to knowledge representation for electrochemistry and batteries. So we extend EMMO with knowledge about chemical substances, focusing mostly on the ones that are relevant for batteries, we extend it with knowledge about electrochemistry, and we extend it with knowledge about batteries. And taken all together, these resources constitute the battery interface ontology that we call BattINFO.
Semantic Technology for Batteries
Great. So that brings us to part three, which is semantic technology for batteries.
So I think we've seen the problem that we need to structure our battery knowledge a little bit more if we want to make it accessible for machines. We've seen that there exists a solution to this problem, which is semantic technology developed for the World Wide Web and pioneered in other areas of science as well. But how do we bring that into the world of battery research specifically?
So there's a few layers to it. Let's take that example that we started with. If we imagine we're in the future where everything is just seamless and automated and it all works great, like in Star Trek, we can come up to our computer and say, "Computer, retrieve the results from discharging my CR2032 cell."
And in order to respond to that query, what does the computer need to know about? Well, it needs an endpoint in order to access the information. It needs some ontology that describes what that data is and how it links to other things. It needs a link to a knowledge graph to say, "Okay, I'm not interested in just any old CR2032 or just any battery discharging data set, but that specific one that this person wants," and find sources to the original data.
So we can start to break this up into different layers, that we have on the one hand, the data layer, which is generally maintained and developed by data stewards. We have the semantic layer, which comprises the ontology and the knowledge graph, and the endpoint, which is usually developed by developers that may be lab notebook, electronic lab notebook developers, or data repository developers. And we have the user interface layer, which is typically where researchers work.
So if you're a researcher and computer programming isn't necessarily your thing, that's okay, because it's not designed that everybody should have to dive into the details. And it's really up to the user interface layer to make that easy and efficient.
So to demonstrate how we can start to respond to this query and how we need to structure our data in order to make that happen, I've actually created an example which is available at this QR code here. And I'll put a link in the chat afterwards and you can feel free to explore on your own. But we'll explore it a little bit right now.
So we're going to walk through the steps of what does it take in order to go from our raw data that's come out of our experiment and build up this layer so that our handy machine can respond to our query.
So this is an entry on Zenodo, which is the QR code that I just showed links to and I'll put a link to in the chat later, showing an example. And we can see here, I've put it through a fair checker and it's got a fair score of greater than 90%. And this contains information data from discharging a coin cell at 11 milliamps.
And it contains some human readable tables to describe what the cell is, what test equipment was it done on. It has a license, it has a picture demonstrating or showing a representation of what exists there. But most importantly, it contains machine-readable metadata to make this link data. And we can dive into that and see exactly what it takes to make link data.
So if we have our data set and we want to publish it according to these standards and to really make it integrated into the Semantic Web, there's four easy steps that we can do to do that.
Easy is relative. I think it's easy. Four relatively easy steps.
So the first is to just save the test data in a file that is structured and that's open. And that second bit is really the important part, especially for batteries, because many of you, as we know, many of the cyclers, many of the pieces of equipment, publish things in proprietary binary languages. But getting it into an open format like CSV, or text, or JSON, or Parquet, allows it to be opened and worked with by many different agents, not just the cycler software.
So already that one step, just publishing it in a file that's structured and open, gets you two stars in the five-star framework.
Step two, post the data to the web. It can be Zenodo, as I've shown, it can be the BIG-MAP archive, maybe an instance of Kadi4Mat, but make it accessible somewhere on the web. And generally here I'm talking about open data, publishable public data. But the same also goes for confidential data, that you can put it within your own internal systems and regulate access to it.
Great. So step three starts to get a little bit more complicated. We're going to take a little detour down cyber road and see some codes. So don't let it scare you, we'll bring it back if you're not a programming person. But step three is to describe our data with semantic terms.
And we just need to provide a little bit of metadata that looks something like this, and we can walk through kind of what exactly it is that this metadata tells us.
So first it answers the question, what is this? And we can see, okay, this is a battery test result. And it's an instance of a data set, according to the DCAT vocabulary.
Okay, cool. Where can I download it? And we can see, oh, there it is. A DCAT download URL.
Great, so now we know what it is, we know how to download it. And the third question, "Okay, well, what does this content mean?" And for that, we can point to a table schema which, if you use a standard format like the battery data format or anything that's outputted by one of the main cyclers, there exist schemas for that already that you get for free.
So with those three bits of metadata, now a machine that's come along that's read this piece of information it knows what it is, it knows where to download it, and it knows what the content means. Easy peasy.
Finally, step four is to link our data to other data. And so we can extend our metadata file description just a little bit to say, "Okay, well, tell me a little bit more, how is this data generated?"
And we can say, "Well, this was a battery test and it was generated from a discharging process. Great. Well, you keep saying the battery, what's the battery? Well, here we can see it's a test object of type CR2032 and that's linked to an image that we can find. Okay, great. What type of equipment was used to do it? Here we see it has test equipment, Battery Cycler, that's manufactured by Biologic."
And so by providing this information in a structured machine-readable format, along with our data that's been published in an open structured format on the web, machines that come across it or humans that come across it can understand what it is, where to find it, how to use it, and how it was generated.
And that starts to open up a lot of really interesting and really exciting things I'll show you.
So, for example, if we take the example that we had before, where we want a machine or a computer to respond to some query about this information, how would it do that and what else can it do?
So we may say, "Okay, our agent will start and it will navigate and find the repository on Zenodo." And from there it can access the metadata file as we've gone through, parse it, bam. Got that knowledge, knows what to do with it, that then can point to other places. CR2032 is a term in the battery interface ontology. It has a link to the context that it knows how to interpret that. So then that can then point it to the BattINFO documentation and the BattINFO resource. And see, okay, we have an elucidation and we also have a link to an equivalent identifier in Wikidata.
So the machine can then take that one step further, go to Wikidata, access all of the information that's there, which includes things like, you know, the diameter and the materials and an image of the cell. So even though we didn't specify that information, and even though we didn't provide a particularly extensive metadata description of the cell, we just said it's CR2032, we're able to leverage all of the existing and all of the linked information that's in BattINFO, that's in Wikidata. From Wikidata, we get a link to the Google Knowledge Graph and we can say, "Okay, we saw on Wikidata that this has manganese dioxide in there." We can get information about manganese dioxide from the Google Knowledge Graph that can then point us to the PubChem database, where we can also retrieve information about that material.
And so this, I think it starts to point to the power of linked data, that it's not just what you explicitly state, but it buys you entry into the Semantic Web. And once you're there, it opens up all sorts of other sources of information that computers and humans can learn from.
So, and that's really just a small taste. I wish I had, I could go on, but I'll keep to the time. But I think it really opens up a lot of new possibilities for machine-accelerated battery innovation, because I've shown an example about cells, but it's also materials, and its components, and test procedures, and data sets. Especially when we think about things that have a really complex and nuanced composition, like electrolytes, expressing these things in machine-readable ways and being able to query over complex electrolyte formulations, it really opens up a lot of interesting options. And by doing that, we can leverage the power of artificial intelligence and the power of automated processing to get the most value from our data.
So I promised it would only be a short trip down cyber road and we can bring it back. And I can say, semantic technology is mostly for machines, but it's also for humans. And we're developing resources to try to make an efficient layer where humans can also benefit from the power of linked data without necessarily having to really get into the source code and the JSON expressions and RDF, but keeping things within the web. And to demonstrate that, we'll come back to this example where we can click on this link, CR2032, and that takes us to the Battery Knowledge Base.
And this is a resource that's being developed by Battery 2030 to act as a human- and machine-readable source of knowledge about batteries. So through the Battery Knowledge Base we can get to links to BattINFO terms, for example, as we see here, 2032. And that can then point us, as we talked about earlier, to other sources of information, like Wikidata, where we have all of this information that's available about materials, about composition, other identifiers within the web, like the Google Knowledge Graph. We can learn about the materials, so we can say "This is manganese dioxide." That's, you know. Also more information can be linked to other sources, other identifiers, the InChIKey, the CAS registry number, the PubChem ID, and takes us all the way back there.
So by expressing this information in a machine-readable way, we also get the added dividend of having it accessible in human-readable systems. And the Battery Knowledge Base especially makes that easy, because here we can say, "Okay, this is also a link to that particular data set." And here we can see we've scraped the data set from Zenodo, the metadata, right into the knowledge base. So the original data is still hosted at Zenodo, we don't duplicate that, but the metadata itself is in the Knowledge Base. And we can say, "Okay, this was used in the Battery 2030 Excellence seminar talk," and there are the slides where people can access it.
And we can link that also back to the person, back to the creator of that data set, which is my profile in the Battery Knowledge Base. And here you can see information about where I work, and information about what projects I work in, and my technical expertise, and some resources, recorded lectures that I've done in the past, publications that I've made, selected data sets that I published, which just so happens to be this one, and even embedded multimedia content. So you can link to videos, or you can link to pictures, or whatever on YouTube.
So yeah, I think the options that it opens when we start seeing data not as some static dead thing that we just have to archive in a repository somewhere and then forget about, but when we see that this thing is a living connection, it's a living connection to information about the cells, about the people who did it, about the publications that come from it, it opens up a lot of interesting opportunities.
The Future is Semantic
So finally, to wrap things up, we've seen the problem, we've seen that there exists technology to solve this in the Semantic Web and World Wide Web standards. And we've seen that we can start applying this now to accelerate battery research and to really get more value from our data.
But what's the future? What does the future look like? Where do we go from here? And to kind of call back to the original beginning and to take a bit of wisdom from Wayne Gretzky. He had this famous quote that said, "Skate not to where the puck is going"... "Skate to where the puck is going to be, not where it has been."
And if we think about that in battery research, okay, where's the puck going? And where do we need to be?
The challenge that we face, I think is clear, of course. It's everybody... We're aware every day that the climate is starting to change and we need to try to address that. The IEA says that batteries need to lead to a 6-fold increase in global energy storage if we're going to meet our 2030 targets. The Batteries European Partnership Association, BEPA, recently released this statement that to remain on the battery map, the EU must run both a sprint and a marathon.
So we've got our work cut out for us. There's a lot of work that we need to do. But the good news is now we have the data and now we have the tools, and this is entirely within our grasp to achieve.
So where do we go now? Well, we need to accelerate battery innovation and semantic technology is the key to unlocking the full power of AI and machine processing for that.
Up till now we focused a lot on creating the necessary infrastructure which has been done through Battery 2030+ and through BIG-MAP. And these have really been herculean efforts. That's a big team effort, especially with support from the EMMC and EMMO, and I think we have to acknowledge and thank them for that.
But now our team is starting to grow across different EU projects. So now we have Full-Map that's starting to carry the torch on. We also have projects like DigiBatt, which are focused on using this to accelerate battery testing and it's starting to grow beyond Europe, or beyond the EU.
In Britain there's the Faraday Institution which is promoting the battery parameter exchange to support interoperability of battery models that builds on top of BattINFO and builds on top of these semantic resources.
In the United States there's the Battery Data Alliance, which is an initiative from the Linux Foundation, to create a standardised battery data format that also builds on BattINFO and extends these semantic tools.
So we're really starting to build a community for this, which is very important, because these tools live or die through community engagement.
But we're also kind of coming to a fork in the road, because now we need to start shifting from infrastructure development to deployment, to training and to integration.
And the key to doing that, I think, is to make it easy and to make it automated, and with the ultimate goal that people should use it without really knowing that they're using it.
And I can just tease a couple of new releases and new tools that will be coming out to support this.
One is a Python package called COLD, which is designed to support ontology-based linked data in a Python framework. And COLD builds on Pydantic. It's an interface that people know and people love already that not only structures our ontology-based information, but also provides built-in validation.
So we can build data instances in kind of a friendly, easy way and automate the production of our JSON-LD. So nobody has to sit here and write JSON-LD code anymore, but we can also build in validation. So if we say, our diameter doesn't have a unit millimetre but has a unit gramme, it throws a flag and says, "Hey, diameter needs a length unit."
And I think the really great thing about this is that's not hard coded anywhere in this package, that comes from the ontology. So it maintains the ontology as a single source of knowledge that's able to be applied for advanced reasoning.
We can also try it with other things, like say, "Okay, our cell has a case." "That's not a case, it's an electrode." And again, we can get some kind of validation warnings to say we're expecting a case instead of an electrode without reference.
So bringing it into Python, I think opens up a lot of options for integrating with other systems.
I'd also like to highlight some good work that's been done at IMPA in Switzerland, by Corsin Battaglia and Nukorn Plainpan who have developed an Excel parser so that people can write their metadata in an environment that they're familiar with, in Excel, and have that automatically parsed into BattINFO and serialised as JSON-LD metadata for RDF integration.
Great. So to wrap up, Battery Semantic Technology is a community resource. This is something that lives or dies through community engagement. And this is something that we really want to start putting at the forefront of our activities here. So we plan to continue to develop the Battery Knowledge Base as a human-readable source of knowledge and metadata.
We will soon make an announcement about opening monthly community office hours to discuss questions and improvements and feel free to participate. Bring your data and we can mark it up and have a great time doing it.
And we'd also like to widen the formal governance structure to cover a wider group of stakeholders in the ontology, to make sure everything is representative of the needs of the community.
So, finally, when we take a step back and think about the place that we are right now in battery research and in the scientific world generally, never before have we had access to so much battery data, more and more every year, and the tools that we need to process it. And semantic technology allows us to combine the power of machine processing with human intuition, which will lead us to a new generation of battery innovation.
So with that, I'd just like to thank you for your time and thank you for your attention, and especially thank you to all of the collaborators that you see here. As I mentioned, this is a big effort that builds on the work of a lot of people in a lot of places, in a lot of projects, especially BIG-MAP, which was really the driver of this work in the beginning. That's now being taken up by Battery 2030 and projects like DigiBatt and of course, the European Union for supporting all this work.
Great. So thank you very much and if you have questions, happy to discuss.
Participants
HasOrganization | HasOu | HasRoom | HasPhoneNumber |
---|
Graph
QR Code
Attachments
📎 Select files (or drop them here)... 📷 Camera
jsondata
type |
| |||||
---|---|---|---|---|---|---|
uuid | "380cbebb-c6b4-4194-8ce5-fde34679b504" | |||||
label |
| |||||
description |
| |||||
start_date | "2025-03-27" | |||||
end_date | "2025-03-27" | |||||
project |
| |||||
name | "Battery2030ExcellenceSeminarSimonClark" | |||||
image | "File:OSW73d19506d85d4650ae666b418648c705.png" | |||||
attachments |
|