Business people, not to mention the public on a global basis, are getting increasingly excited, as well as concerned, about the potential of artificial intelligence (A.I.)—so much so that China’s growing involvement in A.I., and the vast quantity of data that China is capable of generating on a daily basis, has many wondering if the U.S. will be a leader or follower in this important technology category as the future unfolds.
Data is the fuel that feeds A.I. The more data you have, the more A.I. can learn and adapt. Most feel it’s all about the quantity of data. I have been sharing both in my international speeches and consulting that data quantity is good but not if the quality is bad, and this concern should be forthright for anyone involved in A.I.
The value and quality of data being used by A.I. is a key to getting not only the best but also valid insights and suggested actions from A.I. and machine learning applications. If those are of questionable quality, so too will be the results and analyses produced by A.I.
As A.I. grows, so should our focus on the reliability of data upon which A.I. lives.
China Catches Up
As far back as October 2016, China had overtaken America in the number of published journal articles on “deep learning,” an increasingly important form of A.I. Additionally, PwC, a consulting concern, predicted that A.I.-related growth will boost global GDP by roughly $16 trillion by 2030, with roughly half of that growth occurring in China.
All this is certainly eye-catching news. But it is not surprising. If nothing else, there’s a lot more data to be gleaned in China simply because their population is so much larger than that of the U.S. And, the resulting insights from A.I. will undoubtedly prove useful throughout the world, not just in China.
But, to me, it also has a certain feel of gold rush recklessness to it—a mad dash to be the first, the biggest, the most involved.
There’s certainly nothing wrong with that. But, at the same time, I also firmly believe we need to remain focused on the integrity of the entire process. And that boils down to the information without which AI simply cannot function.
If Trash Goes In, What Comes Out?
A well-worn axiom summarizes the situation at hand: “Garbage in, garbage out.” It’s a simple, reliable formula. The end product of anything depends on the quality and integrity of whatever went into making it.
That’s particularly critical with A.I. As we all know, there’s good data and there is bad data. Bad data can come from many causes including mistakes during entry. And the quality of that data drives the value and relevance of the analytic output of A.I.
Some may argue that quantity of data is far more important—in effect, that “it’ll all work out in the end” by good data rising to the top.
Recent history has already proven that’s far too much emphasis on blind faith than genuine confidence.
Not Just Bad, Outdated
Of further concern is data that, at one point, may have been accurate but has lapsed into irrelevance. That makes “legacy data” every bit a point of concern than data that, however current, is simply wrong.
One strategy to help ensure data accuracy and integrity is to make certain to analyze as much data as possible. As this articlepoints out, if a business is only looking at a part of a dataset, they will more likely miss some of the smaller details.
Additionally, make certain to employ sufficient legwork when collecting data for analysis. Analyses can often be thrown off if the source generating it is unreliable or the context in which the data is generated isn’t taken into account.
All this is of particular importance when looked at in the context of the Internet of Things (IoT), through which sensors collect data and machines communicate with other machines. If flawed information or analysis derived from faulty data enters through IoT and is subsequently shared, unraveling the entire string of misinformation could prove a major headache.
Yes, A.I.’s capacity to analyze and learn is changing the way the world works, plays and communicates. But let’s make sure that the data and information that impacts all those areas is as reliable as it possibly can be.
Comments