The Power of Unstructured Information in Data Sharing

Advancements in technology have led to an unprecedented amount of data being generated in the PV sector. Modern inverters now provide more detailed signals. The rise of string inverters has increased the granularity of available data. Additionally, various other equipment, such as weather sensors and monitoring systems, contribute valuable insights.

However, structured data alone is insufficient. Unstructured information, such as energy yield assessments, datasheets, contracts, and emails, provides the crucial context needed to make structured data actionable and reliable. Despite its importance, unstructured information is often overlooked in data-sharing strategies, leading to inefficiencies and misalignment between stakeholders.

This paper reviews the potential of unstructured information, how to unlock it to ease data sharing and the opportunity behind certain untapped unstructured information.

Unstructured Information, and why it is relevant in data sharing context

Asset management information falls into two main categories: structured and unstructured. Structured data is well-organized and easily accessible, such as numerical values in spreadsheets or predefined database fields. Unstructured data, however, consists of diverse documents, emails, contracts, reports, and other textual content that do not follow a fixed format.

Unstructured information can be further divided into two key subcategories:

  • Static Information – This includes essential reference data such as module specifications, inverter rated power, and financial models.
  • Dynamic Information – This covers continuously evolving insights, including root cause analyses, maintenance updates, and real-time performance metrics.

When sharing data, static unstructured information plays a critical role in providing context for accurate analysis. Here are a few examples:

  • Performance ratio calculations rely on module and inverter characteristics found in datasheets.
  • KPI calculations require irradiance and production budget data from Energy Yield Assessments or other resource documentation, which may have multiple versions with slight variations.
  • Site information is essential for troubleshooting, such as the positioning of the pyranometers or the soiling patterns observed.
  • Legacy decisions hold valuable insights but are often lost when key personnel leave a company. Understanding past choices - like why soiling sensors were installed in a certain way or the rationale behind albedo selection - can be crucial for future operations.

Without the context provided by static unstructured information, structured data loses its significance. Working with static unstructured information is valuable, but only if this information is of quality. Three challenges are often faced:

  • Ensuring that the right, up-to-date information is used—and that all stakeholders reference the same reliable sources.
  • Being confident in the validity of the information and being able to trace its quality.
  • Not losing time clicking through documents looking for maybe existing

Thankfully new technologies allow us to address such problem.

AI unlocking the full value of unstructured information

New technologies are making it possible to solve this problem. By combining domain knowledge with AI - specifically large language models (LLMs) - we can automate the extraction and contextualization of valuable insights from unstructured information.

Domain knowledge plays a crucial role in this process, guiding AI by defining what to look for, how it is likely to appear, where it can be found, what other information it connects to, and what should be included in the final output.

Aevy is doing exactly that by combining firsthand operational experience from utility solar and onshore wind with AI. The engine behind Aevy follows these steps:

  1. Retrieve all documents associated to an asset via an integrated live connection to SharePoint.
  2. Extract the information contained in those documents - even in poorly scanned PDF.
  3. Classify documents (O&M Appendix, datasheet, budgets, etc.).
  4. Extract pieces of information specific to the document classification.
  5. Quality assure information extracted.

Entire data room are processed within minutes. This makes it highly efficient to access the information you want, with the possibility to be ensured a high level of confidence and the possibly to trace the information back to its original source.

Imagine extracting the short circuit current and the temperature coefficient of all your assets within seconds.

Quality assurance is carried out at various levels. After all pieces of information are extracted, cross checking at document level, and cross checking at data room level. Feel free to reach out to each to better understand our quality assurance process discussed in another paper.

However, new technologies are not only easing data sharing by enabling quicker and more consistent extraction of static information, but it can also access untapped information enabling new levels of understanding of your assets.

The opportunity of dynamic unstructured information in the data sharing context

Contrary to static unstructured information, dynamic unstructured information is often underutilized, occasionally exchanged through discussions or emails to support data analysis, but rarely systematically leveraged.

However, with new technology - like Aevy - we can now transform unstructured information into structured data, unlocking insights we never previously were able to rely on.

By collecting all the site reports notifying events like outages, curtailments, inspections, and maintenance; we can transform those time-bounded information into a reported outage time series.

In the example given in Figure 2 you can observe two inverters power output over 4 days with their respective reported outages time series. Converting this data into a structured format not only makes it visually intuitive but also enhances its usability. It becomes a valuable input alongside SCADA and meteorological data, strengthening the insights generated by algorithms and machine learning models to detect anomalies and improve performance.

It’s important to acknowledge that as human input influences these time series, they are unlikely to match with a high fidelity the exact start and stop time of events as per the SCADA data. Nevertheless, these offer a brand-new input that may allow a better understanding of certain deviations observed in data analysis.

Conclusion

As the PV sector continues to generate vast amounts of data, the ability to manage both structured and unstructured information is becoming increasingly vital. While structured data has long been optimized, unstructured data - both static and dynamic - remains an untapped resource that can dramatically enhance operational efficiency and collaboration.

AI-driven solutions, like Aevy, are closing this gap by automating the extraction and classification of unstructured information, making it instantly accessible, traceable, and actionable. By unlocking the full value of unstructured data, companies can improve data-sharing processes, enhance decision-making, and drive superior performance outcomes.

The future of data-driven asset management is here—it’s time to harness its full potential.


About the author With a MEng from France and an executive MBA from the University of Manchester, Gautier has managed wind and solar utility assets in Europe, Africa and Asia varying in in size from 1 to 500 MW. He started working as an operations engineer for a third-party asset management service provider, overseeing the day-to-day and inspecting equipment, before moving into a contract management focused role. He then worked as a Senior Asset Manager for the IPP Total Eren (now acquired by TotalEnergies) before co-founding Aevy in 2023.

About Aevy Aevy supercharges asset management by automatically extracting information from your documents. Our proprietary information model ensures reliable extraction of all relevant information, unlocking a step-change in asset management and due diligence of onshore wind and utility solar assets.