简体中文 | English Hi, welcome to the website of top credit technologies co..ltd...
top credit technologies co..ltd

Big data case of a company

Home > Data Analysis > Details
  • Big data case of a company

    Big data case of a company

    Category: Data Analysis

    Time: 2018-12-26 16:40:12

    View: 1407

    Details

    Case background:

    XXX communications company traffic analysis and control systems are facing problems: the amount of data is very large, but the budget is very limited, and the ordinary database analysis system is completely unable to support.

    Problem solving steps

    1. First, the test scheme is proposed:

    About 300 days, about 5 billion imported clusters, and then customized Dashboard analysis.
    Due to the limited budget, the PC cluster of 10 nodes (1 CPU8Core) is customized on hardware.

    2. DEMO:

    Demo: working principle and presentation ability can basically recognize the feasibility of the project from function.
    Test the response speed of multiple queries and multi-user concurrent access under large data volume. After testing, the results are in line with the requirements.

    3. The first phase of technical service support:

    Analysis log: not only some file blocks, but all log files under the entire file system.
    Cleaning: dimension association, dimension cleaning, date cleaning, and so on.
    Application shows: the monthly, daily and annual groups of different dimensions are displayed.

    4. Serious problems arise:

    The data of a day are divided into N links, 388 pieces of data, one block per 6 minutes.
    One day's data, the original DAT file is about 5G, and the correlation database is probably 30G data, at least 500 million data.
    Question: the 300 day data volume is greater than 50 billion, which is 6-7 times the original estimated data volume.

    5. The way to solve the problem:

    Reduce the dimension! Do two hour summarization and add two hours of field to detail data.
    The 3 day detail data are divided into App and non App data 20G data. The total amount of 2G is 10 times lower. Refactoring the front end.

    6. The final programme:

    Configure 180G's JVM memory.
    Hardware: 10 PC, each memory: 32G, 1 CPU 8 Core.
    Historical data centralized Guide: according to two hours label, and dimension table to generate detailed data, and then gather into the library.
    Incremental data automatic guide: import data every 5 minutes and generate summary data every two hours. The system maintains 3 days of detailed data and 300 days of aggregated data for front-end consumption.

    7. Front-end display: flow analysis and control system:

    He flow statistics of the outlets are divided into local cities and sub carriers.
    The flow direction of each network's exports is divided into operators and provinces.
    The traffic volume of each network outlet is divided into local markets.
    The traffic volume of each network's export is ranked by TOPN, which is divided into major categories and specific applications.
    TOPN ranking of hot spot domain names.

    Case test results:

    The amount of data is very large, more than 50 billion logs in 300 days.
    The budget is very limited, invested 10 PC, about 100000 pieces of hardware, software cost-effective.
    The process of log parsing is more difficult. With the requirement of dimension reduction, the difficulty of presentation layer is improved.
    In order to achieve more than ten seconds of interactive response, multiple levels of optimization were carried out.

    SERVICE ONLINE Mr gao
    HOTLINE
    • 13798295525