Originally published on Database Trends and Applications - Mar 23, 2020

AI and Machine Learning Shine a New Light on Data Management

Recently, we took a dive into a new study from Nucleus Research on artificial intelligence and its current uses. The study uncovered a wide number of different trends, including the rush by many enterprises to adopt AI without really understanding its use cases and limitations.

AI and the machine learning that underpins it are surging as top technology initiatives. Yet, the question is this: Are data enterprises ready for the changes it will bring?

For data managers, AI and machine learning not only offer new ways of delivering rapid insights to business users but also the promise of improving and adding intelligence to their own operations. While many AI and machine learning efforts are still works in progress, the technologies hold the potential to deliver more enhanced analytic capabilities throughout enterprises.

For starters, the emergence of AI and machine learning is bringing greater autonomy to databases—but industry experts caution that more complete autonomy is still a distance away. This is “an exciting emerging area,” said Gerrit Kazmaier, executive vice president of SAP HANA and Analytics. “But trusting AI and machine learning solutions to take full responsibility for the management of database systems across all profiles—from low-risk to enterprise-critical applications—will take time.”

Most AI-driven database advances, at least for the near-term, will be seen at the periphery of data environments and will take time to move to the center due to the complexity of enterprise data environments. “The focus to date has been on adding more complex functionality that requires constant configuration and tuning, particularly where schemas do not match or data is semi-or fully unstructured,” said Lewis Carr, senior director of product management at Actian. At the same time, the use of AI and machine learning inference in an unsupervised mode is now being successfully applied in edge environments—“where there isn’t any skilled technical support, let alone data engineers or scientists,” he added.

The various requirements seen in the core of enterprises span many infrastructure concerns, including “the installation, upgrades, administration, monitoring, and tuning of the database platform,” said Anthony Roach, senior product manager for MarkLogic. He predicts, however, that such oversight will be improved as software becomes more “introspective,” and predictive algorithms are employed to forecast and act on impending faults.

Add to these the virtually unlimited capabilities of the cloud, “which means you can react and re-route around those faults seamlessly,” Roach added. “The cloud will make database management a solved problem and the enterprise will take on the more critical task of data management—including security, privacy, lifecycle management, and more.” At this time, however, these requirements are “beyond the capabilities of current or proposed AI and machine learning systems.”

In the meantime, AI and machine learning solutions are increasingly playing roles in managing key aspects of data management. “Pointing AI at the databases themselves to auto-tune and optimize their behavior is becoming critical to successful data management as SaaS, hybrid, and edge-to-cloud databases become reality,” said Richard Beeson, CTO for OSIsoft. “To optimize the deployment and usage of massively distributed and dynamically available resources—compute, storage, networking—that underlie the management of data is becoming (or has become) untenable for any world-class system.” Databases will become autonomous, but only incrementally in 2020, Beeson explained. “For the average company—not SaaS


While completely AI-driven autonomous databases are still a ways off, there are a number of groundbreaking applications emerging. For example, the very idea of what constitutes a data model is changing. “Forget human schema concepts like star and snowflake,” said Paige Roberts, open source relations manager at Micro Focus. “Internal AI can design data models a lot better than people. The database is far more in touch with how it stores and retrieves data, and can measure to the microsecond.”

Data-driven business decisions “require smart human minds,” Roberts added. “People shouldn’t waste their valuable time worrying about table structures.” Data management has focused on the “static structure of data at rest, classically within a database management system,” added Naz Quadri, head of enterprise data science and quant development at Bloomberg L.P. Thanks to AI and machine learning, “people are starting to look at the dynamic structure found within data without imposing such static up-front restrictions. This should allow us to begin to harvest information from data sources previously thought unsuitable.”

AI and machine learning are providing other effective data management capabilities as well. For example, these technologies can “predict table growth to improve capacity planning,” said Thomas Fredell, chief product officer for Merrill Corp. Additional capabilities enhanced with AI and machine learning are “self-management, healing, resiliency through auto-tuning, and identifying anomalies,” he added. Ensuring data quality is another area, as AI can perform the mundane tasks of looking for bad data and offering suggestions for fixing the content.

Configuration and workload management are also candidates for AI-driven data management. “Individual configuration settings can be augmented by—or even completely driven—using machine learning algorithms,” said SAP’s Kazmaier, who stated that recently AI and machine-learning techniques—including mathematical optimization, deep learning, and reinforcement learning—have come to the fore. In addition, AI and machine learning “are driving the detection of security breaches based on abnormal data access, such as reading large amounts of data or atypical selection criteria.”

Query management is another potential area for AI, as it can be “used to identify potential root causes of long-running queries,” said Fredell. AI can also flag and potentially throttle users placing a disproportionately large load on the database, he added. Load patterns can be identified and preemptively scaled at certain times of the day, or days of the week, to provide a better user experience and effectively control costs.

In addition, AI and machine learning “can help catalog the massive amounts of data and metadata being produced today, and use that data to forecast outcomes and prevent failures,” said Kiran Chitturi, CTO architect at Sungard Availability Services. “AI and ML can also improve customer experience and support by quickly and intelligently answering queries from data lakes, while also analyzing customer sentiment. It can also help ensure smarter data governance, maintain data privacy, and monitor for privacy regulations as well.”


Industry experts are divided on the ultimate impact of AI and machine learning on data-related jobs, particularly among data professionals. “AI and machine learning will definitely impact data management jobs, but not in the way that many people are fearful of,” said Carr. “Instead, by complementing the skill sets of existing employees, AI and machine learning will help to ‘upskill’ the processes of these employees,” As a result, Carr continued, “we are going to see more diverse backgrounds and skill sets in data architects and IT staff, coupled with backgrounds in everything from hard sciences to social science, law, and many other fields.”

The role of the DBA will be changing as well, Fredell predicts. “There will be fewer traditional DBAs in the future, as many of these tasks will be automated through AI and machine learning,” he said. “The new DBA role will be expected to utilize AI-and ML-powered tools to simultaneously manage a much larger number of database instances than they do today. Data analysts will spend less time on troubleshooting and data cleanup and increase their focus on data management, stewardship, quality, delivery, and creating value from the database content rather than managing it.”

Traditionally, data management departments and staff have been tasked with reporting a historic view of the business and data for management to learn and make predictive decisions,” said Ed Macosky, senior vice president of product, UX, and solutions at Boomi, a Dell Technologies business. “Data management departments and staff are now being brought in more strategically to assist with a forward-looking view of the organization and to provide guidance on data management to predict the future.”

As AI and machine learning take hold, databases will require “less and less time and effort to manage as the software gets smarter about managing itself,” said Roberts. “Data analytics and the automated decision making of AI are becoming less a competitive edge and more table stakes to stay in the competitive game in nearly every industry. Smart people who can figure out the best way to combine, analyze, and put data to work will always be in demand.”


Along with the shift or displacement of jobs, AI and machine learning introduce other types of risks into data environments. Since AI and machine learning are relatively new on the scene and require new processes, tools, and skills in data environments, there is the need for careful planning and deliberation, industry experts advise. Boomi’s Macosky cautioned against rushing into AI and machine learning too quickly, as the challenge with implementing these technologies is to “define proper architectures without impacting production operations.” AI and machine learning operations “can be technically intensive, and you want to be careful that you don’t impact the UX of the daily users of the data.”

AI and machine learning operations also “require a lot of computing power, resulting in additional costs on top of the existing business intelligence infrastructures already in place for an organization today,” Macosky added. In addition, there is risk that the wrong data may be analyzed, and coming to conclusions that are not helpful to business outcomes. “This leads to some risk with implementing AI and machine learning for BI,” Macosky said.

Along these lines, industry experts raised concerns about the quality of AI-driven decisions. “The system could make the wrong, or suboptimal, decision,” said Kazmaier. “This could be, for example, selecting the wrong objectives or the system’s inability to optimize when there are competing objectives, both of which would result in poor service levels. The challenge, therefore, lies in equipping database systems with the knowledge and tools to harness automated technologies and build trust in AI and machine learning-driven components. DBAs will need to have a strong understanding of the decision-making capabilities of ML- and AI-based models to qualify the risks of automated decisions.”

Humans need to stay in the process, and there will still need to be “human checks and balances—especially if the DBAs are not well-versed on how AI handles tuning,” said Fredell. “While some tasks can be fully automated with AI and ML, others will need to be reviewed by a human for approval before being committed. Robust logging, reporting, and rollback will be essential if automated changes need to be undone.”

AI and machine learning may also add to the problem of technical debt, said Chris Bergh, CEO of DataKitchen. “Machine learning tools are evolving to make it faster and less costly to develop AI systems. But deploying and maintaining these systems over time is getting exponentially more complex and expensive. Data science teams are incurring enormous technical debt by deploying systems without the processes and tools to maintain, monitor, and update them. Further, poor quality data sources create unplanned work and cause errors that invalidate results.”

Clarity and explainability also must be addressed, and this is increasingly a concern for organizations that need to trust the results their systems produce. “When you employ machine learning algorithms to make decisions, by their very nature, the decision process is opaque as compared to a rules-based system,” said Roach. “They essentially write their own rules based on the data so that data needs to be accurate, well-understood, and appropriate for the purpose.” As a result, the reliance on good data will be no different than it is today, said Roach. “The fact that an ML system may process orders of magnitude more in a black box increases the danger.”

Many challenges, however, are in the process of being addressed, according to Saif Ahmed, product owner of machine learning at Kinetica. “First, there are so many ways to learn the skills needed to work with AI and machine learning that it’s hardly a specialty at most companies. Google and the other big tech companies may still have access to the most cutting-edge data researchers, but for the rest of us, it’s not an impossible task to find employees with the skills in the right wheelhouse. Secondly, every year, the amount of computing power you get for the same dollar goes up, while the amount you need for a successful ML project goes down. This means funding and housing the necessary computing power is much more feasible. Lastly, data is being recognized across industries as a crucial business asset. There are entire startup ecosystems built around data—data cleaning, data capture, data-cleaning-as-a-service—it’s not an obscure resource anymore.”


AI-managed data environments may alleviate data staffs from many low-level tasks, but getting there requires having the right skill sets across the enterprise. “There are still skills that any professional within data management and analysis should always be honing,” said Carr. “For example, data engineers and data scientists who have an understanding of the business objectives will be needed.” Even for those that stay in traditional IT, “dealing with the evolution of IT application development and operations IT management tools will require an understanding of AI/ML as it will be embedded in all the product road maps for what they are currently using,” Carr added.

Data managers, engineers, and admins should expand their skill sets to include “a general understanding of AI and ML modeling concepts and use cases—ranging from classic predictive and ML to deep learning frameworks,” said Kazmaier. “Their expertise and support will be needed in the integration and operationalization of AI and machine learning scenarios in applications.” AI and machine learning is also bringing about increased demand for “data scientists who will help steer and support AI and machine learning initiatives in system operations,” he said.

Such advanced skills are necessary, as AI development and deployment is a complex workflow, said Bergh. “If executed manually, it is slow, error-prone, and inflexible. The actual output of the model development process—a set of files, scripts, parameters, source code—is only a small fraction of what it takes to deploy and maintain a model successfully. In addition to ML code, there is data collection, data verification, feature extraction, serving infrastructure, monitoring, and more. The technology and infrastructure supporting the model are actually more substantial and critical than the model itself.”

Being able to analyze data is one thing, “but the ability to present the findings in an easily digestible way will help set you apart from your peers,” said Alex Ough, senior architect-CTO, Sungard Availability Services. “Even if the data analysis is well done, it will not make a difference if you cannot present it to others, especially people without related knowledge. It’s critical to learn how to tell a story based on the findings, and then visually present that story in the most efficient way possible.”

Start small, think big, Ahmed advised. “The ones who will do machine learning well are going to be the ones who stay focused. Most organizations have thousands of use cases where machine learning could be applied. By picking specific issues like the optimal number of shipments or pricing, you can see more significant results. Smart adopters will tackle the low-hanging fruit and work their way up. And some of these smaller, easier projects can come with big dollar signs that make it clear whether or not your model is working. If you can measure your work, you can sustain your project.”